NYC Buildings#

Many plotting libraries can handle collections of polygons, including Bokeh and HoloViews. However, because browser-based libraries like Bokeh and Plotly send all the polygon data to Javascript running in the browser, they can struggle when either the collections or the individual polygons themselves get large. Even natively in Python, typical formats like Shapely for representing polygons scale poorly to large polygon collections, because each polygon is wrapped up as a full, separate Python object, leading to a lot of duplicated storage overhead when many polygons of the same type are defined.
If you want to work with lots of polygons, here you can see how to use SpatialPandas and Dask to represent polygons efficiently in memory, and hvPlot and Datashader to render them quickly in a web browser.
This example plots the outlines of all one million+ buildings in New York City. See nyc.gov for the original data and its description.
import hvplot.dask # noqa
import hvplot.pandas # noqa
import datashader as ds
import colorcet as cc
import spatialpandas as spd
import spatialpandas.io
from holoviews import opts
from holoviews.streams import PlotSize
from dask.distributed import Client
from IPython.display import display
client = Client()
client
Client
Client-73573ed9-f5f0-11ef-8a87-7c1e52098285
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
f393e82b
Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
Total threads: 4 | Total memory: 15.62 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-ad88b300-73a6-45b7-be31-e8da80b3fa99
Comm: tcp://127.0.0.1:40369 | Workers: 4 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 4 |
Started: Just now | Total memory: 15.62 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:46333 | Total threads: 1 |
Dashboard: http://127.0.0.1:34221/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:33547 | |
Local directory: /tmp/dask-scratch-space/worker-u7y1sv9x |
Worker: 1
Comm: tcp://127.0.0.1:40979 | Total threads: 1 |
Dashboard: http://127.0.0.1:35427/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:40965 | |
Local directory: /tmp/dask-scratch-space/worker-fa3zysxa |
Worker: 2
Comm: tcp://127.0.0.1:32893 | Total threads: 1 |
Dashboard: http://127.0.0.1:46347/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:43423 | |
Local directory: /tmp/dask-scratch-space/worker-gk8dr4be |
Worker: 3
Comm: tcp://127.0.0.1:33353 | Total threads: 1 |
Dashboard: http://127.0.0.1:40299/status | Memory: 3.90 GiB |
Nanny: tcp://127.0.0.1:34583 | |
Local directory: /tmp/dask-scratch-space/worker-hf5x4fkd |
Show code cell content
# Add more resolution to dynamic plots, particularly important for Retina displays when building the website.
# This cell is hidden on the website.
PlotSize.scale=2.0
opts.defaults(opts.Polygons(height=500, xaxis=None, yaxis=None))
ddf = spd.io.read_parquet_dask('./data/nyc_buildings.parq').persist()
print(len(ddf))
ddf.head(3)
1157859
geometry | type | name | |
---|---|---|---|
hilbert_distance | |||
1078494 | MultiPolygon([[[-8277494.696406237, 4938583.71... | <NA> | <NA> |
1087198 | MultiPolygon([[[-8277324.377585323, 4938712.53... | <NA> | <NA> |
1189938 | MultiPolygon([[[-8276894.684350859, 4938973.11... | industrial | <NA> |
Here you can see that we have 1.1 million “MultiPolygons”, some of which have a type
and name
declared.
To get a look at this data, let’s plot all the polygons, overlaid on a tiled map of the region:
%%time
display(ddf.hvplot.polygons(tiles='CartoLight', rasterize=True, aggregator='any'))