UK Researchers#

UK research networks with HoloViews+Bokeh+Datashader#

Datashader makes it possible to plot very large datasets in a web browser, while Bokeh makes those plots interactive, and HoloViews provides a convenient interface for building these plots. Here, let’s use these three programs to visualize an example dataset of 600,000 collaborations between 15000 UK research institutions, previously laid out using a force-directed algorithm by Ian Calvert.

First, we’ll import the packages we are using and set up some defaults.

import pandas as pd
import holoviews as hv
from holoviews import opts

from colorcet import fire
from datashader.bundling import directly_connect_edges, hammer_bundle

from holoviews.operation.datashader import datashade, dynspread
from holoviews.operation import decimate

from dask.distributed import Client
client = Client()

hv.notebook_extension('bokeh','matplotlib')

decimate.max_samples=20000
dynspread.threshold=0.01
datashade.cmap=fire[40:]
sz = dict(width=150,height=150)

opts.defaults(
    opts.RGB(width=400, height=400, xaxis=None, yaxis=None, show_grid=False, bgcolor="black"))

The files are stored in the efficient Parquet format:

r_nodes_df = pd.read_parquet("./data/graph/calvert_uk_research2017_nodes.snappy.parq")
r_edges_df = pd.read_parquet("./data/graph/calvert_uk_research2017_edges.snappy.parq")
r_nodes_df = r_nodes_df.set_index("id")
r_edges_df = r_edges_df.set_index("id")
r_nodes = hv.Points(r_nodes_df, label="Nodes")
r_edges = hv.Curve(r_edges_df, label="Edges")
len(r_nodes), len(r_edges)
(15001, 593915)

We can render each collaboration as a single-line direct connection, but the result is a dense tangle:

%%time
r_direct = hv.Curve(directly_connect_edges(r_nodes.data, r_edges.data),label="Direct")
CPU times: user 3.5 s, sys: 328 ms, total: 3.83 s
Wall time: 3.84 s
dynspread(datashade(r_nodes,cmap=["cyan"])) + datashade(r_direct)