Datashader Dashboard#
This notebook contains the code for an interactive dashboard for making Datashader plots from any dataset that has latitude and longitude (geographic) values. Apart from Datashader itself, the code relies on other Python packages from the HoloViz project that are each designed to make it simple to:
lay out plots and widgets into an app or dashboard, in a notebook or for serving separately (Panel)
build interactive web-based plots without writing JavaScript (Bokeh)
build interactive Bokeh-based plots backed by Datashader, from concise declarations (HoloViews and hvPlot)
express dependencies between parameters and code to build reactive interfaces declaratively (Param)
describe the information needed to load and plot a dataset, in a text file (Intake)
import os, colorcet, param as pm, holoviews as hv, panel as pn, datashader as ds
import intake
import xyzservices.providers as xyz
from holoviews.element import tiles as hvts
from holoviews.operation.datashader import rasterize, shade, spread
from collections import OrderedDict as odict
hv.extension('bokeh', logo=False)
/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
_numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
_numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
_numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
You can run the dashboard here in the notebook with various datasets by editing the dataset
below to specify some dataset defined in dashboard.yml
. You can also launch a separate, standalone server process in a new browser tab with a command like:
DS_DATASET=nyc_taxi panel serve --show dashboard.ipynb
(Where nyc_taxi
can be replaced with any of the available datasets (nyc_taxi
, nyc_taxi_50k
(tiny version), census
, osm-1b
, or any dataset whose description you add to catalog.yml
). To launch multiple dashboards at once, you’ll need to add -p 5001
(etc.) to select a unique port number for the web page to use for communicating with the Bokeh server. Otherwise, be sure to kill the server process before launching another instance.
dataset = os.getenv("DS_DATASET", "nyc_taxi")
catalog = intake.open_catalog('catalog.yml')
source = getattr(catalog, dataset)
The Intake source
object lets us treat data in many different formats the same in the rest of the code here. We can now build a class that captures some parameters that the user can vary along with how those parameters relate to the code needed to update the displayed plot of that data source:
plots = odict([(source.metadata['plots'][p].get('label',p),p) for p in source.plots])
fields = odict([(v.get('label',k),k) for k,v in source.metadata['fields'].items()])
aggfns = odict([(f.capitalize(),getattr(ds,f)) for f in ['count','sum','min','max','mean','var','std']])
norms = odict(Histogram_Equalization='eq_hist', Linear='linear', Log='log', Cube_root='cbrt')
cmaps = odict([(n,colorcet.palette[n]) for n in ['fire', 'bgy', 'bgyw', 'bmy', 'gray', 'kbc']])
maps = ['EsriImagery', 'EsriUSATopo', 'EsriTerrain', 'EsriStreet', 'OSM']
bases = odict([(name, getattr(hvts, name)().relabel(name)) for name in maps])
gopts = hv.opts.Tiles(responsive=True, xaxis=None, yaxis=None, bgcolor='black', show_grid=False)
class Explorer(pm.Parameterized):
plot = pm.Selector(plots)
field = pm.Selector(fields)
agg_fn = pm.Selector(aggfns)
normalization = pm.Selector(norms)
cmap = pm.Selector(cmaps)
spreading = pm.Integer(0, bounds=(0, 5))
basemap = pm.Selector(bases)
data_opacity = pm.Magnitude(1.00)
map_opacity = pm.Magnitude(0.75)
show_labels = pm.Boolean(True)
@pm.depends('plot')
def elem(self):
return getattr(source.plot, self.plot)()
@pm.depends('field', 'agg_fn')
def aggregator(self):
field = None if self.field == "counts" else self.field
return self.agg_fn(field)
@pm.depends('map_opacity', 'basemap')
def tiles(self):
return self.basemap.opts(gopts).opts(alpha=self.map_opacity)
@pm.depends('show_labels')
def labels(self):
return hv.Tiles(xyz.CartoDB.PositronOnlyLabels()).opts(level='annotation', alpha=1 if self.show_labels else 0)
def viewable(self,**kwargs):
rasterized = rasterize(hv.DynamicMap(self.elem), aggregator=self.aggregator, width=800, height=400)
shaded = shade(rasterized, cmap=self.param.cmap, normalization=self.param.normalization)
spreaded = spread(shaded, px=self.param.spreading, how="add")
dataplot = spreaded.apply.opts(alpha=self.param.data_opacity, show_legend=False)
return hv.DynamicMap(self.tiles) * dataplot * hv.DynamicMap(self.labels)
explorer = Explorer(name="")
If we call the .viewable
method on the explorer
object we just created, we’ll get a plot that displays itself in a notebook cell. Moreover, because of how we declared the dependencies between each bit of code and each parameters, the corresponding part of that plot will update whenever one of the parameters is changed on it. (Try putting explorer.viewable()
in one cell, then set some parameter like explorer.spreading=4
in another cell.) But since what we want is the user to be able to manipulate the values using widgets, let’s go ahead and create a dashboard out of this object by laying out a logo, widgets for all the parameters, and the viewable object:
logo = "https://raw.githubusercontent.com/pyviz/datashader/main/doc/_static/logo_horizontal_s.png"
panel = pn.Row(pn.Column(logo, pn.Param(explorer.param, expand_button=False)), explorer.viewable())
panel.servable("Datashader Dashboard")
If you are viewing this notebook with a live Python server process running, adjusting one of the widgets above should now automatically update the plot, re-running only the code needed to update that particular item without re-running Datashader if that’s not needed. It should work the same when launched as a separate server process, but without the extra text and code visible as in this notebook. Here the .servable()
method call indicates what should be served when run as a separate dashboard with a command like panel serve --show dashboard.ipynb
, or you can just copy the code above out of this notebook into a dashboard.py
file then do panel serve --show dashboard.py
.
How it works#
You can use the code above as is, but if you want to adapt it to your own purposes, you can read on to see how it works.
Overview#
The code has three main components:
source
: A dataset with associated metadata managed by Intake, which allows this notebook to ignore details like:File formats
File locations
Column and field names in the data
Basically, once thesource
has been defined in the cell starting withdataset
, this code can treat all datasets the same, as long as their properties have been declared appropriately in thedashboard.yml
file. Intake objects support.plot
, which uses hvPlot to return a HoloViews and Bokeh-based plot object that is used in the later steps below.
explorer
: A Parameterized object that declares:What parameters we want the user to be able to manipulate
How to generate the overall plot specified by those parameters, starting from the basic hvPlot-based object and modifying it using HoloViews, GeoViews, and Datashader.
Which bits of the code need to be run when one of the parameters changes
All of these things are declared in a general way that’s not tied to any particular GUI toolkit, as long as whatever is returned byviewable()
is something that can be displayed.
panel
: A Panel-based app/dashboard consisting of:a logo (just for pretty!)
The user-adjustable parameters of the
explorer
object.The viewable HoloViews object defined by
explorer.viewable
.
You can find out more about how to work with these objects at the websites linked for each one.
If you want to start working with this code for your own purposes, parts 1 and 3 should be simple to get started with. You should be able to add new datasets easily to dashboard.yml
by copying the description of the simplest dataset (e.g. osm-1b
). If you wish, you can then compare that dataset’s description to the other datasets, to see how other fields and metadata can be added if you want there to be more options for users to explore a particular dataset. Similarly, you can easily add additional items to lay out in rows and columns in the panel
app; it should be trivial to add anything Panel supports (text boxes, images, other separate plots, etc.) to this layout as described at panel.holoviz.org.
Expressing parameters and dependencies#
Part 2 (the explorer
object) is the hard part to specify, because that’s where the complex relationships between the user-visible parameters and the underlying behavior are expressed.
Before we try to understand the full explorer
code above, let’s consider a much simpler case. What if our dataset is so small (e.g. nyc_taxi_50k
with only 50,000 points) that it would be ok to update the plot every time any widget changed? In that case we could get away with a much simpler object we’ll call explorer2
:
class Explorer2(pm.Parameterized):
plot = pm.Selector(plots)
field = pm.Selector(fields)
agg_fn = pm.Selector(aggfns)
normalization = pm.Selector(norms)
cmap = pm.Selector(cmaps)
spreading = pm.Integer(0, bounds=(0, 5))
basemap = pm.Selector(bases)
data_opacity = pm.Magnitude(1.00)
map_opacity = pm.Magnitude(0.75)
show_labels = pm.Boolean(True)
def view(self,**kwargs):
field = None if self.field == "counts" else self.field
rasterized = rasterize(hv.DynamicMap(getattr(source.plot, self.plot)),
aggregator=self.agg_fn(field), width=800, height=400)
shaded = shade(rasterized, cmap=self.cmap, normalization=self.normalization)
spreaded = spread(shaded, px=self.spreading, how="add")
dataplot = spreaded.opts(alpha=self.data_opacity, show_legend=False)
tiles = self.basemap.opts(gopts).opts(alpha=self.map_opacity)
labels = hv.Tiles(xyz.CartoDB.PositronOnlyLabels()).opts(level='annotation', alpha=1 if self.show_labels else 0)
return tiles * dataplot * labels
explorer2 = Explorer2(name="")
This Explorer2
class declares that it respects each of the listed Parameters (plot
, normalization
, spreading
, and so on), specifying the type and range for each of them (e.g. normalization
can be eq_hist
, linear
, etc. while spreading
can be any integer in the range 0 to 5.). The view
function accesses these values and constructs a plot appropriately, which in this case is a HoloViews Overlay
of three components: (1) the underlying map tiles
(like Google Maps), (2) the datashaded dataplot
(using the aggregation (rasterization), colormapping (shading) with normalization, and spreading functionality from Datashader), and (3) overlaid geographic labels
(which also happen to be a tile-based map, but with only text).
If you were to type explorer2.view()
in a cell on its own, you would see that the resulting object is viewable outside of Panel and is already controlled by all those parameters, though without any widgets for a user to manipulate graphically.
# explorer2.show_labels = False
# explorer2.view()
But since we do want widgets, we can pass in this object to Panel and we’ll get all the same widgets as the original explorer
has, each updating the plot appropriately when any of the widgets changes.
# pn.Row(pn.Column(logo, explorer2.param), explorer2.view)
So if it’s this easy to get a usable dashboard, why is the real Explorer
class so much more complex above, with all those methods and explicit dependency declarations? Well, if you do try running explorer2
, you should be able to see that it works but it is not very responsive, because it re-runs view()
every single time any widget changes value. That’s a very general approach, but even for a 50k dataset, the plot flickers any time a widget is used. For a larger dataset there can be a very annoying lag, as the entire plot is rebuilt from scratch. Slider widgets in particular can be very confusing with a lag like that, making it difficult to explore the data. So this simple version is not the most usable, but it’s still a good first pass – it makes all the right widgets and connects them all up to control the plot that you see.
Expressing fine-grained dependencies#
So what if we do anticipate working with larger files and still want the interface to be responsive wherever possible? The full Explorer
class shows how to ensure that only the specific bits of code that depend directly on each parameter are re-run when that widget is changed, making it highly reponsive even with large datasets. For instance, the map_opacity
slider affects only the underlying map tiles, and so in Explorer
that slider can be dragged with instantaneous updating regardless of the dataset size; the data plot stays the same while the tiles update. The spreading
and cmap
widgets do need to access the data, but even they can still be very fast, because they affect only the very last step in the data processing, after aggregation but before the final display.
So, how is this fine-grained control over bits of computation achieved? First, you’ll notice that Explorer
has a method named as an adjective “viewable
” rather than the imperative “view
” method of Explorer2
. For Explorer2
, we provided the view
method directly to Panel, and Panel then finds its dependencies and calls view
every time a parameter changes, generating a completely new plot. But we called viewable()
only once, with the result of the call provided to Panel. This result is a viewable (and dynamically updatable) object from HoloViews whose parts are precisely tied internally to each of the relevant parameters. (Hence the perhaps too-subtle difference in the names of those two methods; one is a command to view immediately, and the other returns a viewable object that can be kept around and viewed at will.)
To understand the explorer.viewable()
method, first consider a simpler version that doesn’t display the data at all:
def viewable(self,**kwargs):
return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.labels)
Here, hv.DynamicMap(callback)
returns a dynamic HoloViews object that calls the provided callback
whenever the object needs updating. When given a Parameterized object’s method, hv.DynamicMap
understands any dependencies that have been declared for that method. In this case, the map tiles will thus be dynamically updated whenever the map_opacity
or basemap
parameters change, and the overlaid labels will be updated whenever the show_labels
parameter changes (because those are the relationships expressed on def tiles(self)
and def labels(self)
with the pm.depends
decorator in the declaration of Explorer
above). The viewable()
method here then returns an overlay (constructed by the *
syntax for HoloViews objects), retaining the underlying dynamic behavior of the two overlaid items.
Still following along? If not, try changing viewable
to the simpler version shown above and play around with the source code to see how those parts fit together. Once that all makes sense, then we could add in a plot of the actual data:
def viewable(self,**kwargs):
return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.elem) * hv.DynamicMap(self.labels)
Just as before, we use a DynamicMap
to call the .elem()
method whenever one of its parameter dependencies changes (plot
in this case). Don’t actually run this version, though, unless you have a very small dataset (even the tiny nyc_taxi_50k
may be too large for some browsers). As written, this code will pass all the data on to your browser, with disastrous results for large datasets! This is where Datashader comes in; to make it safe for large data, we can instead wrap this object in some HoloViews operations that turn it into something safe to display:
def viewable(self,**kwargs):
return hv.DynamicMap(self.tiles) * spread(shade(rasterize(hv.DynamicMap(self.elem)))) * hv.DynamicMap(self.labels)
This version is now runnable, with rasterize()
dynamically aggregating the data using Datashader whenever a new plot is needed, shade()
then dynamically colormapping the data into an RGB image, and spread()
dynamically spreading isolated pixels so that they become visible data points. But if you try it, you’ll notice that the plot is ignoring all of the rasterization, shading, and spreading parameters we declared above, because those parameters are not declared as dependencies of the elem
method that was given to this DynamicMap.
We could of course add those parameters as dependencies to .elem
, but if we do that, then the whole set of chained operations will need to be re-run every time any one of those parameters changes. For a large dataset, re-running all those steps can take seconds or even minutes, yet some of the changes only affect the very last (and very cheap) stages of the computation, such as spread
or shade
.
So, we come to the final version of viewable()
that’s used in the actual Explorer
class definition above, which creates a whole slew of chained hv.DynamicMap
objects that each dynamically respond to some of the possible user actions:
hv.DynamicMap(self.elem)
returns an appropriate HoloViews element whenever theplot
parameter changesrasterized
applies the Datashader aggregation operation to the result ofhv.DynamicMap(self.elem)
whenever that result changes or when the dependencies of theself.aggregator
method change (thefield
andagg_fn
parameters)shaded
applies the Datashader shading operation to the result ofrasterized
whenever that result changes or thecmap
andnormalization
parameters change.dataplot
sets options on the result ofshaded
whenever that result changes or thedata_opacity
andshow_legend
parameters change.
So far we have only discussed how pm.depends()
allows a Parameterized method to declare its dependencies, but there are actually currently four different ways to set up dynamic, responsive behavior, of which Explorer.viewable()
uses methods 2, 3, and 4:
Method dependency for Panel: Decorating a Parameterized object method with
pm.depends('paramname')
, and passing that method to Panel so that Panel will call the method when any of the dependencies changes (used forExplorer2.view
, but not forExplorer.viewable
). Completely general, but very coarse-grained; useful for a first pass and for simple cases.Method dependency for DynamicMap: Decorating a Parameterized object method with
pm.depends('paramname')
, and passing that method to a HoloViews DynamicMap so that HoloViews will call the method when any of the dependencies change (used for most of the methods ofExplorer
).Parameter instance-object argument: Supplying a Param
Parameter
object as an argument to a HoloViews operation or DynamicMap instead of a concrete value like an integer or float, which will cause that operation to re-run its computation when the parameter value changes (used for thedataplot
object inExplorer
, which responds dynamically tocmap
,normalization
,spreading
, anddata_opacity
because those parameters are supplied likepx=self.param.spreading
(theparam.Integer
Parameter object) rather thanpx=self.spreading
(which is simply equivalent topx=0
if the current value ofself.spreading
is 0)). Dependencies are inferred only when the whole Parameter object is supplied, not just the current value.Other HoloViews streams: Approaches 2 and 3 are based on a feature of HoloViews called streams, which support many types of dynamic behavior other than responding to widgets. For instance, the
rasterize
operation attaches aRangeXY
stream that re-aggregates the data whenever the viewport (x or y range) changes, as a user zooms or pans a Bokeh plot. Other streams can be created manually to perform custom behavior, such as consuming streaming data sources, reacting to arbitrary plot events, and so on.
These sources of dynamic behavior make the dataplot
chain of DynamicMaps be richly interactive. The interactive dataplot
is then overlaid with hv.DynamicMap(self.tiles)
(which itself is updated when the map_opacity
and basemap
parameters change), and with hv.DynamicMap(self.labels)
(which itself is updated when the show_labels
parameter changes). Now, each part of the plot updates only if the relevant parameters have changed, and otherwise is left as it was, avoiding flicker and providing snappy performance.
As you can see, if we want to be very careful to tie each bit of computation to the values that affect it, then we can precisely determine which code to re-run interactively, providing maximal responsiveness where possible (try dragging the opacity sliders or selecting colormaps), while re-running everything when needed (when aggregation-related parameters change). In your own Panel dashboards and apps, you can often use the simplest approach (which can be as simple as a one-line call to interact), but it is important to know that fine-grained control is available when you need it, and is still typically vastly more concise and explicitly expressed than with other dashboarding approaches.