NYC Buildings#

datashader
Published: January 27, 2021 · Modified: November 2, 2023


Many plotting libraries can handle collections of polygons, e.g. Bokeh or HoloViews+Bokeh. However, because browser-based libraries like Bokeh and Plotly send all the polygon data to the browser, they can struggle when either the collections or the polygons themselves get large. Even natively in Python, typical formats like Shapely for representing polygons scale poorly to large polygon collections, because each polygon is wrapped up as a separate Python object, leading to a lot of duplicated storage overhead when many polygons of the same type are defined.

If you want to work with lots of polygons, here you can see how to use SpatialPandas and Dask to represent polygons efficiently in memory, fastparquet to represent them efficiently on disk, and Datashader to render them quickly in a web browser. This notebook also demonstrates how to support hovering for datashaded polygons, with Bokeh overlaying a single vector-based representation of a polygon where the mouse cursor is, while all the rest are sent to the browser only as rendered pixels. That way hover and other interactive features can be supported fully without ever needing to transfer large amounts of data or store them in the limited memory of the web browser tab.

This example plots the outlines of all the buildings in New York City. See nyc.gov for the original data and its description.

import warnings
warnings.simplefilter('ignore')
import holoviews as hv
import colorcet as cc
import datashader as ds
import spatialpandas as spd
import spatialpandas.io

from dask.diagnostics import ProgressBar
from holoviews.operation.datashader import (
    rasterize, datashade, inspect_polygons
)

hv.extension('bokeh')
ddf = spd.io.read_parquet_dask('./data/nyc_buildings.parq').persist()

Now we compute the top categories and drop everything else:

cats = list(ddf.type.value_counts().compute().iloc[:10].index.values) + ['unknown']
ddf['type'] = ddf.type.replace({None: 'unknown'})
ddf = ddf[ddf.type.isin(cats)]
ddf['type'] = ddf['type'].astype('category').cat.as_known()

with ProgressBar():
    ddf = ddf.build_sindex().persist()
[                                        ] | 0% Completed | 938.31 us
[                                        ] | 0% Completed | 102.52 ms
[                                        ] | 0% Completed | 339.88 ms
[                                        ] | 0% Completed | 447.66 ms
[                                        ] | 0% Completed | 554.25 ms
[                                        ] | 0% Completed | 655.18 ms
[                                        ] | 0% Completed | 756.25 ms
[                                        ] | 0% Completed | 859.01 ms
[                                        ] | 0% Completed | 961.96 ms
[                                        ] | 0% Completed | 1.07 s
[                                        ] | 0% Completed | 1.19 s
[                                        ] | 0% Completed | 1.30 s
[                                        ] | 0% Completed | 1.41 s
[                                        ] | 0% Completed | 1.54 s
[                                        ] | 0% Completed | 1.67 s
[                                        ] | 0% Completed | 1.78 s
[                                        ] | 0% Completed | 1.90 s
[                                        ] | 0% Completed | 2.01 s
[                                        ] | 0% Completed | 2.11 s
[                                        ] | 0% Completed | 2.23 s
[                                        ] | 0% Completed | 2.36 s
[                                        ] | 0% Completed | 2.47 s
[                                        ] | 0% Completed | 2.70 s
[                                        ] | 0% Completed | 2.80 s
[                                        ] | 0% Completed | 2.93 s
[                                        ] | 0% Completed | 3.03 s
[                                        ] | 0% Completed | 3.13 s
[                                        ] | 0% Completed | 3.24 s
[                                        ] | 0% Completed | 3.35 s
[                                        ] | 0% Completed | 3.45 s
[                                        ] | 0% Completed | 3.58 s
[                                        ] | 0% Completed | 3.68 s
[                                        ] | 0% Completed | 3.79 s
[                                        ] | 0% Completed | 3.90 s
[                                        ] | 0% Completed | 4.02 s
[                                        ] | 0% Completed | 4.13 s
[                                        ] | 0% Completed | 4.23 s
[                                        ] | 0% Completed | 4.33 s
[                                        ] | 0% Completed | 4.44 s
[                                        ] | 0% Completed | 4.54 s
[                                        ] | 0% Completed | 4.65 s
[                                        ] | 0% Completed | 4.76 s
[                                        ] | 0% Completed | 4.86 s
[                                        ] | 0% Completed | 4.96 s
[                                        ] | 0% Completed | 5.07 s
[                                        ] | 0% Completed | 5.17 s
[                                        ] | 0% Completed | 5.29 s
[                                        ] | 0% Completed | 5.39 s
[                                        ] | 0% Completed | 5.50 s
[                                        ] | 0% Completed | 5.61 s
[                                        ] | 0% Completed | 5.97 s
[                                        ] | 0% Completed | 6.10 s
[                                        ] | 0% Completed | 6.20 s
[                                        ] | 0% Completed | 6.32 s
[                                        ] | 0% Completed | 6.46 s
[                                        ] | 0% Completed | 6.57 s
[                                        ] | 0% Completed | 6.68 s
[                                        ] | 0% Completed | 6.78 s
[                                        ] | 0% Completed | 6.89 s
[                                        ] | 0% Completed | 6.99 s
[                                        ] | 0% Completed | 7.09 s
[                                        ] | 0% Completed | 7.19 s
[                                        ] | 0% Completed | 7.29 s
[                                        ] | 0% Completed | 7.40 s
[                                        ] | 0% Completed | 7.52 s
[                                        ] | 0% Completed | 7.62 s
[                                        ] | 0% Completed | 7.72 s
[                                        ] | 0% Completed | 7.83 s
[                                        ] | 0% Completed | 7.93 s
[                                        ] | 0% Completed | 8.03 s
[                                        ] | 0% Completed | 8.13 s
[                                        ] | 0% Completed | 8.23 s
[                                        ] | 0% Completed | 8.33 s
[                                        ] | 0% Completed | 8.43 s
[                                        ] | 0% Completed | 8.53 s
[                                        ] | 0% Completed | 8.63 s
[##                                      ] | 6% Completed | 8.73 s
[#####                                   ] | 12% Completed | 8.84 s
[#######                                 ] | 18% Completed | 8.94 s
[##########                              ] | 25% Completed | 9.04 s
[###############                         ] | 37% Completed | 9.14 s
[################                        ] | 40% Completed | 9.24 s
[####################                    ] | 50% Completed | 9.35 s
[######################                  ] | 56% Completed | 9.45 s
[##########################              ] | 65% Completed | 9.55 s
[##############################          ] | 75% Completed | 9.65 s
[#################################       ] | 84% Completed | 9.75 s
[####################################    ] | 90% Completed | 9.85 s
[######################################  ] | 96% Completed | 9.96 s
[########################################] | 100% Completed | 10.06 s

Next we build a legend for the categories and declare a tile source as backdrop:

colors    = cc.glasbey_bw_minc_20_maxl_70
color_key = {cat: tuple(int(e*255.) for e in colors[i]) for i, cat in enumerate(cats)}
legend    = hv.NdOverlay({k: hv.Points([0,0], label=str(k)).opts(
                                         color=cc.rgb_to_hex(*v), size=0, apply_ranges=False)
                          for k, v in color_key.items()}, 'Type')

tiles = hv.element.tiles.CartoLight().opts(
    min_height=500, responsive=True, xaxis=None, yaxis=None)

Now we put it all together, declaring a Polygons element from our data, datashade them and use the inspect_polygons operation to allow us to hover on the data:

polys = hv.Polygons(ddf, vdims='type')

shaded = datashade(polys, color_key=color_key, aggregator=ds.by('type', ds.any()))

hover = inspect_polygons(shaded).opts(fill_color='red', tools=['hover'])

tiles * shaded * legend * hover

Finally we will plot each category of buildings separately:

hv.NdLayout({
    cat: hv.element.tiles.CartoLight() * rasterize(polys.select(type=cat), aggregator='any') for cat in cats
}, 'Type').opts('Image', width=250, height=400, xaxis=None, yaxis=None).cols(4)
This web page was generated from a Jupyter notebook and not all interactivity will work on this website.