OpenSky flight trajectories#

hvplotholoviewsbokehmatplotlibdatashader
Published: November 3, 2017 · Modified: November 21, 2024


Flight path information for commercial flights is available for some regions of the USA and Europe from the crowd-sourced OpenSky Network. OpenSky collects data from a large number of users monitoring public air-traffic control information. Here we will use a subset of the data that was polled from their REST API at an interval of 1 minute over 4 days (September 5-13, 2016), using the collect_data.py and prepare_data.py scripts. In general the terms of use for OpenSky data do not allow redistribution, but we have obtained specific permission for distributing the subset of the data used in this project, which is a 200MB Parquet file (1.1GB as the original database). If you want more or different data, you can run the scripts yourself, or else you can contact OpenSky asking for a copy of the dataset.

We’ll only use some of the fields provided by OpenSky, out of: icao24, callsign, origin, time_position, time_velocity, longitude, latitude, altitude, on_ground, velocity, heading, vertical_rate, sensors, timestamp

Here, we’ll load the data and declare that some fields are categorical (which isn’t information fully expressed in the Parquet file):

%%time
import pandas as pd

flightpaths = pd.read_parquet('./data/opensky.parq')
flightpaths['origin']    = flightpaths.origin.astype('category')
flightpaths['ascending'] = flightpaths.ascending.astype('category')
flightpaths.tail()
CPU times: user 355 ms, sys: 139 ms, total: 494 ms
Wall time: 494 ms
longitude latitude origin ascending velocity
10227905 -8.845280e+06 4.553381e+06 True 262.14
10227906 -8.862735e+06 4.540946e+06 False 183.28
10227907 -8.876594e+06 4.530873e+06 False 258.15
10227908 -8.894316e+06 4.521176e+06 True 234.24
10227909 NaN NaN False 0.00

The default database has about 10 million points, with some metadata for each.

Now let’s define a datashader-based processing pipeline to render images:

import datashader as ds
from colorcet import fire

plot_width  = 850
plot_height = 600
x_range = (-2.0e6, 2.5e6)
y_range = (4.1e6, 7.8e6)
/home/runner/work/examples/examples/opensky/envs/default/lib/python3.11/site-packages/dask/dataframe/__init__.py:42: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)

We can use this function to get a dump of all of the trajectory information:

cvs = ds.Canvas(plot_width, plot_height, x_range, y_range)
%%time
agg = cvs.line(flightpaths, 'longitude', 'latitude',  ds.count())
CPU times: user 2.15 s, sys: 55.2 ms, total: 2.21 s
Wall time: 2.21 s
ds.tf.set_background(ds.tf.shade(agg, cmap=fire), 'black')

This plot shows all of the trajectories in this database, overlaid in a way that avoids overplotting. With this “fire” color map, a single trajectory shows up as black, while increasing levels of overlap show up as brighter colors.

A static image on its own like this is difficult to interpret, but if we overlay it on a map we can see where these flights originate, and can zoom in to see detail in specific regions:

import hvplot.pandas # noqa
from holoviews import opts

opts.defaults(
    opts.Path(width=plot_width, height=plot_height, xaxis=None, yaxis=None,
              xlim=x_range, ylim=y_range))
flightpaths.hvplot.paths(
    'longitude', 'latitude', tiles='EsriStreet',
     aggregator=ds.count(), datashade=True,
)