IEX Trading#
IEX, the Investors Exchange, is a transparent stock exchange that discourages high-frequency trading and makes historical trading data publicly available. The data is offered in the form of daily pcap files where each single packet corresponds to a stock trade.
Even with this specialized pcap file format, these daily records can exceed a gigabyte in size on a given day. In this notebook, we will develop a dashboard that will allow us to explore every single trade that happened in a day, including the associated metadata. To visualize all this data at once both rapidly and interactively, we will use datashader via the HoloViews API.
Loading the data#
The IEX stock data is saved in two formats of pcap file called TOPS and DEEP. These formats are complex enough to make it non trivial to parse the trades with standard packet loading tools. For this reason, the trades for Monday 21st of October 2019 are supplied as a CSV file that has been generated from the original pcap file using the IEXTools library.
import warnings
warnings.simplefilter('ignore')
import datetime
import pandas as pd
df = pd.read_csv('./data/IEX_2019-10-21.csv')
print('Dataframe loaded containing %d events' % len(df))
Dataframe loaded containing 1222412 events
We can now look at the head of this DataFrame to see its structure:
df.head()
symbol | size | price | timestamp | |
---|---|---|---|---|
0 | ZVZZT | 50 | 10.015 | 1571659221573444414 |
1 | RSX | 300 | 23.210 | 1571659313752906463 |
2 | BABA | 100 | 171.400 | 1571659356868902969 |
3 | BABA | 3 | 171.400 | 1571659357585239782 |
4 | KMT | 83 | 25.000 | 1571659403813905391 |
Each row above corresponds to a stock trade where price
indicates
the stock price, the size
indicates the size of the trade and the
symbol
specifies which stock was traded. Every trade also has a
timestamp specified in nanoseconds.
Note that multiple trades can occur on the same timestamp.
Visualizing trade with Spikes
#
We can now load HoloViews with the Bokeh plotting extension to start visualizing some of this data:
import holoviews as hv
from bokeh.models import HoverTool
from holoviews.operation.datashader import spikes_aggregate
hv.config.image_rtol = 10e-3 # Fixes datetime issue at high zoom level
hv.extension('bokeh')
One way to visualize events that occur over time is to use the Spikes element. Here we look at the first hundred spikes in this dataframe:
hv.Spikes(df.head(100), ['timestamp'],
['symbol', 'size', 'price']).opts(xrotation=90, tools=['hover'],
spike_length=1, position=0)
As in the dataframe tables shown above, the timestamps are expressed as integers counting the nanoseconds since Unix epoch (UTC). While many domains may use integers as their time axis (e.g CPU cycle for processor events), in this case we would like to recover the timestamp as a date.
We will do this in two steps (1) we map the integers to datetime64[ns]
to get datetime
objects and (2) we subtract 4 hours to go from UTC to
the local time at the exchange (located in New Jersey):
df.timestamp = df.timestamp.astype('datetime64[ns]')
df.timestamp -= datetime.timedelta(hours=4)
Here every line corresponds to a trade where the position along the
x-axis indicates the time at which that trade occurred (the timestamp
in nanoseconds). If you hover over the spikes above, you can view all
the timestamp values for the trades underneath the cursor as well as
their corresponding stock symbols.
Using Bokeh we can only visualize a small number of trades effectively, but using datashader we can visualize all 1.2 million trades available:
spikes = hv.Spikes(df, ['timestamp'], ['symbol', 'size', 'price'])
rasterized = spikes_aggregate(spikes,
aggregator='count', spike_length=1).opts(
width=600, colorbar=True, cmap='blues',
yaxis=None, xrotation=90,
default_tools=['xwheel_zoom', 'xpan', 'xbox_zoom'])
rasterized