Walker Lake#
The Disappearing Walker Lake#
While the loss of the Aral Sea in Kazakhstan and Lake Urmia in Iran have received a lot of attention over the last few decades, this trend is a global phenomena. Reciently a number of papers have been published including one focusing on the Decline of the world’s saline lakes. Many of these lakes have lost the majority of their volume over the last century, including Walker Lake (Nevada, USA) which has lost 90 percent of its volume over the last 100 years.
The following example is intended to replicate the typical processing required in change detection studies similar to the Decline of the world’s saline lakes.
from pathlib import Path
import geoviews as gv
import holoviews as hv
import numpy as np
import rioxarray
import xarray as xr
import cartopy.crs as ccrs
from colorcet import coolwarm
from holoviews import opts
from holoviews.operation.datashader import rasterize
from IPython.display import display
hv.extension('bokeh')
In this example, we would like to use Dask to demonstrate how image processing can be distributed across workers, either running locally or across a cluster. In the next cell, we instantiate a Dask distributed Client where we request eight, single-threaded workers and declare a memory limit of 8GB per worker. You can experiment with different memory limits (e.g 4GB) and different numbers of workers but note that each worker should only use one thread as Datashader manages its own parallelization using Numba:
# arbitrarily choose a memory limit (8GB) to demonstrate the out of core
# processing infrastructure
from dask.distributed import Client
client = Client(memory_limit=8*1e9, n_workers=8, threads_per_worker=1)
# As Datashader uses parallel Numba for raster rendering, we need to use
# single threaded Dask workers on each CPU to avoid contention.
client
Client
Client-9342b987-b258-11ef-88a6-000d3a313db5
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
70e53f63
Dashboard: http://127.0.0.1:8787/status | Workers: 8 |
Total threads: 8 | Total memory: 59.60 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-74af8f2a-46b7-4612-b430-48edaa062ba6
Comm: tcp://127.0.0.1:43825 | Workers: 8 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 8 |
Started: Just now | Total memory: 59.60 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:38195 | Total threads: 1 |
Dashboard: http://127.0.0.1:34901/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:36025 | |
Local directory: /tmp/dask-scratch-space/worker-k5q8ms0w |
Worker: 1
Comm: tcp://127.0.0.1:44179 | Total threads: 1 |
Dashboard: http://127.0.0.1:35855/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:34325 | |
Local directory: /tmp/dask-scratch-space/worker-s1513o67 |
Worker: 2
Comm: tcp://127.0.0.1:34977 | Total threads: 1 |
Dashboard: http://127.0.0.1:42603/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:44841 | |
Local directory: /tmp/dask-scratch-space/worker-1osy1g00 |
Worker: 3
Comm: tcp://127.0.0.1:38991 | Total threads: 1 |
Dashboard: http://127.0.0.1:37667/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:36657 | |
Local directory: /tmp/dask-scratch-space/worker-1_j2raya |
Worker: 4
Comm: tcp://127.0.0.1:42243 | Total threads: 1 |
Dashboard: http://127.0.0.1:36987/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:35037 | |
Local directory: /tmp/dask-scratch-space/worker-qy9fb4c1 |
Worker: 5
Comm: tcp://127.0.0.1:37977 | Total threads: 1 |
Dashboard: http://127.0.0.1:33643/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:42981 | |
Local directory: /tmp/dask-scratch-space/worker-n8ts7sjb |
Worker: 6
Comm: tcp://127.0.0.1:46241 | Total threads: 1 |
Dashboard: http://127.0.0.1:42141/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:44385 | |
Local directory: /tmp/dask-scratch-space/worker-z82ktfoc |
Worker: 7
Comm: tcp://127.0.0.1:42393 | Total threads: 1 |
Dashboard: http://127.0.0.1:36033/status | Memory: 7.45 GiB |
Nanny: tcp://127.0.0.1:35791 | |
Local directory: /tmp/dask-scratch-space/worker-vv0jt76x |
Landsat Image Data#
To replicate this study, we first have to obtain the data from primary sources. The conventional way to obtain Landsat image data is to download it through USGS’s EarthExplorer or NASA’s Giovanni, but to facilitate the example two images have been downloaded from EarthExployer and cached.
The two images used by the original study are LT05_L1TP_042033_19881022_20161001_01_T1 and LC08_L1TP_042033_20171022_20171107_01_T1 from 1988/10/22 and 2017/10/22 respectively. These images contain Landsat Surface Reflectance Level-2 Science Product images.
Loading into xarray#
In the next cells, we load the Landsat-5 and Landsat-8 files into xarray DataArray
objects, reading them locally using rioxarray
.
def read_landsat_files(pattern):
data_dir = Path('data')
data = {
int(file.stem[-1]): rioxarray.open_rasterio(file, chunks={"x": 1200, "y": 1200}, masked=True)
for file in sorted(data_dir.glob(pattern))
}
dataset = xr.concat(data.values(), dim="band")
dataset = dataset.assign_coords({"band": list(data)})
return dataset
landsat_5_img = read_landsat_files('LT05*')
landsat_5_img
<xarray.DataArray (band: 2, y: 7241, x: 7961)> Size: 461MB dask.array<concatenate, shape=(2, 7241, 7961), dtype=float32, chunksize=(1, 1200, 1200), chunktype=numpy.ndarray> Coordinates: * x (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05 * y (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06 spatial_ref int64 8B 0 * band (band) int64 16B 4 5 Attributes: Band_1: band 4 surface reflectance AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0 long_name: band 4 surface reflectance
landsat_8_img = read_landsat_files('LC08*')
landsat_8_img
<xarray.DataArray (band: 2, y: 7941, x: 7821)> Size: 497MB dask.array<concatenate, shape=(2, 7941, 7821), dtype=float32, chunksize=(1, 1200, 1200), chunktype=numpy.ndarray> Coordinates: * x (x) float64 63kB 2.433e+05 2.433e+05 ... 4.779e+05 4.779e+05 * y (y) float64 64kB 4.426e+06 4.426e+06 ... 4.188e+06 4.188e+06 spatial_ref int64 8B 0 * band (band) int64 16B 4 5 Attributes: Band_1: band 4 surface reflectance AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0 long_name: band 4 surface reflectance
We create a cartopy coordinate reference system (EPSG:32611) that we will be using later on in this notebook:
assert landsat_5_img.rio.crs == landsat_8_img.rio.crs
print(landsat_5_img.rio.crs)
crs = ccrs.epsg(landsat_5_img.rio.crs.to_epsg())
EPSG:32611
Computing the NDVI (1988)#
Now let us compute the NDVI for the 1988 image.
ndvi5 = (landsat_5_img.sel(band=5) - landsat_5_img.sel(band=4))/(landsat_5_img.sel(band=5) + landsat_5_img.sel(band=4))
client.persist(ndvi5)
<xarray.DataArray (y: 7241, x: 7961)> Size: 231MB dask.array<truediv, shape=(7241, 7961), dtype=float32, chunksize=(1200, 1200), chunktype=numpy.ndarray> Coordinates: * x (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05 * y (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06 spatial_ref int64 8B 0
Computing the NDVI (2017)#
Now we can do this for the Landsat 8 files for the 2017 image:
ndvi8 = (landsat_8_img.sel(band=5) - landsat_8_img.sel(band=4))/(landsat_8_img.sel(band=5) + landsat_8_img.sel(band=4))
Resampling to same size#
The two images share the same coordinate system but do not have the exact same dimensions or coordinates. Previous versions of this notebook resampled the images to the same size, and optionally allowed to regrid them, all using Datashader. In this version, we now interpolate the Landsat-8 image to fit onto the coordinates of the Landsat-5 one using xarray
, approach that provides a similar result.
ndvi8 = ndvi8.interp_like(ndvi5, method="nearest")
client.persist(ndvi8)
<xarray.DataArray (y: 7241, x: 7961)> Size: 231MB dask.array<transpose, shape=(7241, 7961), dtype=float32, chunksize=(7241, 7961), chunktype=numpy.ndarray> Coordinates: spatial_ref int64 8B 0 * x (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05 * y (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
Viewing change via dropdown#
Using Datashader together with GeoViews, we can now easily build an interactive visualization where we select between the 1988 and 2017 images. The use of datashader allows these images to be dynamically updated according to zoom level (Note: it can take datashader a minute to ‘warm up’ before it becomes fully interactive). For more information on how the dropdown widget was created using HoloMap
, please refer to the HoloMap reference.
opts.defaults(
opts.Curve(width=600, tools=['hover']),
opts.Image(cmap='viridis', width=450, height=450, tools=['hover'], colorbar=True))
hmap = hv.HoloMap({'1988':gv.Image(ndvi5, crs=crs, vdims=['ndvi'], rtol=10),
'2017':gv.Image(ndvi8, crs=crs, vdims=['ndvi'], rtol=10)},
kdims=['Year']).redim(x='lon', y='lat') # Mapping 'x' and 'y' from rasterio to 'lon' and 'lat'
%%time
display(rasterize(hmap))
CPU times: user 4.31 s, sys: 599 ms, total: 4.91 s
Wall time: 17.6 s
Computing statistics and projecting display#
The rest of the notebook shows how statistical operations can reduce the dimensionality of the data that may be used to compute new features that may be used as part of an ML pipeline.
The mean and sum over the two time points#
The next plot (may take a minute to compute) shows the mean of the two NDVI images next to the sum of them:
mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Mean over Year')
summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Sum over Year')
%%time
display(rasterize(mean_img) + rasterize(summed_image))
CPU times: user 15.5 s, sys: 1.48 s, total: 17 s
Wall time: 44.5 s
Difference in NDVI between 1988 and 2017#
The change in Walker Lake as viewed using the NDVI can be shown by subtracting the NDVI recorded in 1988 from the NDVI recorded in 2017:
diff = hmap['1988'].data - hmap['2017'].data
difference = gv.Image(diff, crs=crs, kdims=['lon', 'lat'], vdims=['ndvi'])
difference = difference.relabel('Difference in NDVI').redim(ndvi='delta_ndvi')
%%time
display(rasterize(difference).redim.range(delta_ndvi=(-1.0,1.0)).opts(cmap=coolwarm))
CPU times: user 1.87 s, sys: 105 ms, total: 1.98 s
Wall time: 3.63 s
You can see a large change (positive delta) in the areas where there is water, indicating a reduction in the size of the lake over this time period.
Slicing across lon
and lat
#
As a final example, we can use the sample
method to slice across the difference in NDVI along (roughly) the midpoint of the latitude and the midpoint of the longitude. To do this, we define the following helper function to convert latitude/longitude into the appropriate coordinate value used by the DataSet
:
def from_lon_lat(x,y):
return crs.transform_point(x,y, ccrs.PlateCarree())
lon_y, lat_x = from_lon_lat(-118, 39) # Longitude of -118 and Latitude of 39
%%time
display((difference.sample(lat=lat_x) + difference.sample(lon=lon_y)).cols(1))
CPU times: user 374 ms, sys: 12.5 ms, total: 386 ms
Wall time: 479 ms