Seattle lidar

Download this project.


Seattle Lidar

Visualize Lidar Scattered Point Elevation Data

This notebook uses Datashader to visualize Lidar elevation data from the Puget Sound Lidar consortium, a source of Lidar data for the Puget Sound region of the state of Washington, U.S.A.

Lidar Elevation Data

Example X,Y,Z scattered point elevation data from the unpacked 7zip files (unpacked as .gnd files)

! head ../data/q47122d2101.gnd
X,Y,Z
1291149.60,181033.64,467.95
1291113.29,181032.53,460.24
1291065.38,181035.74,451.41
1291113.16,181037.32,455.51
1291116.68,181037.42,456.20
1291162.42,181038.90,467.81
1291111.90,181038.15,454.89
1291066.62,181036.73,451.41
1291019.10,181035.20,451.64

The Seattle area example below loads 25 .gnd elevation files like the one above. We'll download, cache and read the data using intake.

In [1]:
import intake

cat = intake.open_catalog('./catalog.yml')
list(cat)
Out[1]:
['seattle_lidar']
In [2]:
lidar = cat.seattle_lidar()
df = lidar.to_dask()
df.head()
Out[2]:
X Y Z
0 1293274.13 181399.80 435.79
1 1293278.38 181393.05 434.58
2 1293272.96 181401.08 435.86
3 1293271.02 181401.13 435.73
4 1293270.85 181407.80 436.15

Geographic Metadata

Since the data are geographic, we need to know the coordinate reference system (CRS) of the X and Y. All the geographic metadata is stored in the data source entry in the intake catalog.

In [3]:
from pyproj import Proj, transform
In [4]:
lidar.metadata['crs']
Out[4]:
'State Plane Coordinate System Washington North FIPS 4601'
In [5]:
washington_state_plane = Proj(init='epsg:2855')   # Washington State Plane North EPSG code
web_mercator = Proj(init='epsg:3857')             # Mercator projection EPSG code
In [6]:
FT_2_M = 0.3048  

def convert_coords(df):
    lon, lat = transform(washington_state_plane, web_mercator, df.X.values * FT_2_M, df.Y.values * FT_2_M)
    df['meterswest'], df['metersnorth'] = lon, lat 
    return df[['meterswest', 'metersnorth', 'Z']]

Try out the convert_coords function on a subset of the data

In [7]:
convert_coords(df.head())
Out[7]:
meterswest metersnorth Z
0 -1.360741e+07 6.022196e+06 435.79
1 -1.360741e+07 6.022193e+06 434.58
2 -1.360741e+07 6.022197e+06 435.86
3 -1.360741e+07 6.022197e+06 435.73
4 -1.360741e+07 6.022200e+06 436.15

Convert the coordinates

Since our real dataset is large and partitioned using dask, we need to think about how to apply the convert_coords function to our data.

In [8]:
import dask.distributed
import dask.delayed
In [9]:
dask.distributed.Client()
Out[9]:

Client

Cluster

  • Workers: 4
  • Cores: 8
  • Memory: 17.18 GB

Explore the task graph to figure out a performant way to split up the coords conversion. First we'll try with using dask.delayed.

In [10]:
dask.delayed(convert_coords)(df) \
    .visualize()
Out[10]:

We can see that even though we thought dask.delayed would help, in actuality we would be requiring all the processes to be done first and then the conversion would happen on the whole dataset in one go. Another approach would be to use dask.map_partitions to do the conversion on each piece of the data.

In [11]:
df_merc = df.map_partitions(convert_coords)
df_merc.visualize()
Out[11]: