Network Packets#
Graphing network packets#
This notebook currently relies on HoloViews 1.9 or above. Run conda install holoviews
to install it.
Preparing data#
The data source comes from a publicly available network forensics repository: http://www.netresec.com/?page=PcapFiles. The selected file is https://download.netresec.com/pcap/maccdc-2012/maccdc2012_00000.pcap.gz.
tcpdump -qns 0 -r maccdc2012_00000.pcap | grep tcp > maccdc2012_00000.txt
For example, here is a snapshot of the resulting output:
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.24.100.1038 > 192.168.202.68.8080: tcp 0
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.27.100.37877 > 192.168.204.45.41936: tcp 0
09:30:07.780000 IP 192.168.24.100.1038 > 192.168.202.68.8080: tcp 0
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
09:30:07.780000 IP 192.168.202.68.8080 > 192.168.24.100.1038: tcp 1380
Given the directional nature of network traffic and the numerous ports per node, we will simplify the graph by treating traffic between nodes as undirected and ignorning the distinction between ports. The graph edges will have weights represented by the total number of bytes across both nodes in either direction.
python pcap_to_parquet.py maccdc2012_00000.txt
The resulting output will be two Parquet dataframes, maccdc2012_nodes.parq
and maccdc2012_edges.parq
.
NOTE: For your convenience this last step is captured in the anaconda-project run prepare_data
command.
Loading data#
import holoviews as hv
from holoviews import opts, dim
import networkx as nx
import dask.dataframe as dd
from holoviews.operation.datashader import (
datashade, dynspread, directly_connect_edges, bundle_graph, stack
)
from holoviews.element.graphs import layout_nodes
from datashader.layout import random_layout
from colorcet import fire
hv.extension('bokeh')
keywords = dict(bgcolor='black', width=800, height=800, xaxis=None, yaxis=None)
opts.defaults(opts.Graph(**keywords), opts.Nodes(**keywords), opts.RGB(**keywords))