Walker Lake#

datashadergeoviewsbokehxarraydask

Published: October 11, 2018 · Modified: December 4, 2024

The Disappearing Walker Lake#

While the loss of the Aral Sea in Kazakhstan and Lake Urmia in Iran have received a lot of attention over the last few decades, this trend is a global phenomena. Reciently a number of papers have been published including one focusing on the Decline of the world’s saline lakes. Many of these lakes have lost the majority of their volume over the last century, including Walker Lake (Nevada, USA) which has lost 90 percent of its volume over the last 100 years.

The following example is intended to replicate the typical processing required in change detection studies similar to the Decline of the world’s saline lakes.

from pathlib import Path

import geoviews as gv
import holoviews as hv
import numpy as np
import rioxarray
import xarray as xr

import cartopy.crs as ccrs
from colorcet import coolwarm
from holoviews import opts
from holoviews.operation.datashader import rasterize
from IPython.display import display

hv.extension('bokeh')

In this example, we would like to use Dask to demonstrate how image processing can be distributed across workers, either running locally or across a cluster. In the next cell, we instantiate a Dask distributed Client where we request eight, single-threaded workers and declare a memory limit of 8GB per worker. You can experiment with different memory limits (e.g 4GB) and different numbers of workers but note that each worker should only use one thread as Datashader manages its own parallelization using Numba:

# arbitrarily choose a memory limit (8GB) to demonstrate the out of core
# processing infrastructure
from dask.distributed import Client
client = Client(memory_limit=8*1e9, n_workers=8, threads_per_worker=1)
# As Datashader uses parallel Numba for raster rendering, we need to use
# single threaded Dask workers on each CPU to avoid contention.
client

Client

Client-9342b987-b258-11ef-88a6-000d3a313db5

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

70e53f63

Dashboard: http://127.0.0.1:8787/status	Workers: 8
Total threads: 8	Total memory: 59.60 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-74af8f2a-46b7-4612-b430-48edaa062ba6

Comm: tcp://127.0.0.1:43825	Workers: 8
Dashboard: http://127.0.0.1:8787/status	Total threads: 8
Started: Just now	Total memory: 59.60 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:38195	Total threads: 1
Dashboard: http://127.0.0.1:34901/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:36025
Local directory: /tmp/dask-scratch-space/worker-k5q8ms0w

Worker: 1

Comm: tcp://127.0.0.1:44179	Total threads: 1
Dashboard: http://127.0.0.1:35855/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:34325
Local directory: /tmp/dask-scratch-space/worker-s1513o67

Worker: 2

Comm: tcp://127.0.0.1:34977	Total threads: 1
Dashboard: http://127.0.0.1:42603/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:44841
Local directory: /tmp/dask-scratch-space/worker-1osy1g00

Worker: 3

Comm: tcp://127.0.0.1:38991	Total threads: 1
Dashboard: http://127.0.0.1:37667/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:36657
Local directory: /tmp/dask-scratch-space/worker-1_j2raya

Worker: 4

Comm: tcp://127.0.0.1:42243	Total threads: 1
Dashboard: http://127.0.0.1:36987/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:35037
Local directory: /tmp/dask-scratch-space/worker-qy9fb4c1

Worker: 5

Comm: tcp://127.0.0.1:37977	Total threads: 1
Dashboard: http://127.0.0.1:33643/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:42981
Local directory: /tmp/dask-scratch-space/worker-n8ts7sjb

Worker: 6

Comm: tcp://127.0.0.1:46241	Total threads: 1
Dashboard: http://127.0.0.1:42141/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:44385
Local directory: /tmp/dask-scratch-space/worker-z82ktfoc

Worker: 7

Comm: tcp://127.0.0.1:42393	Total threads: 1
Dashboard: http://127.0.0.1:36033/status	Memory: 7.45 GiB
Nanny: tcp://127.0.0.1:35791
Local directory: /tmp/dask-scratch-space/worker-vv0jt76x

Landsat Image Data#

To replicate this study, we first have to obtain the data from primary sources. The conventional way to obtain Landsat image data is to download it through USGS’s EarthExplorer or NASA’s Giovanni, but to facilitate the example two images have been downloaded from EarthExployer and cached.

The two images used by the original study are LT05_L1TP_042033_19881022_20161001_01_T1 and LC08_L1TP_042033_20171022_20171107_01_T1 from 1988/10/22 and 2017/10/22 respectively. These images contain Landsat Surface Reflectance Level-2 Science Product images.

Loading into xarray#

In the next cells, we load the Landsat-5 and Landsat-8 files into xarray DataArray objects, reading them locally using rioxarray.

def read_landsat_files(pattern):
    data_dir = Path('data')
    data = {
        int(file.stem[-1]): rioxarray.open_rasterio(file, chunks={"x": 1200, "y": 1200}, masked=True)
        for file in sorted(data_dir.glob(pattern))
    }
    dataset = xr.concat(data.values(), dim="band")
    dataset = dataset.assign_coords({"band": list(data)})
    return dataset

landsat_5_img = read_landsat_files('LT05*')
landsat_5_img

<xarray.DataArray (band: 2, y: 7241, x: 7961)> Size: 461MB
dask.array<concatenate, shape=(2, 7241, 7961), dtype=float32, chunksize=(1, 1200, 1200), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05
  * y            (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
    spatial_ref  int64 8B 0
  * band         (band) int64 16B 4 5
Attributes:
    Band_1:         band 4 surface reflectance
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    long_name:      band 4 surface reflectance

landsat_8_img = read_landsat_files('LC08*')
landsat_8_img

<xarray.DataArray (band: 2, y: 7941, x: 7821)> Size: 497MB
dask.array<concatenate, shape=(2, 7941, 7821), dtype=float32, chunksize=(1, 1200, 1200), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 63kB 2.433e+05 2.433e+05 ... 4.779e+05 4.779e+05
  * y            (y) float64 64kB 4.426e+06 4.426e+06 ... 4.188e+06 4.188e+06
    spatial_ref  int64 8B 0
  * band         (band) int64 16B 4 5
Attributes:
    Band_1:         band 4 surface reflectance
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    long_name:      band 4 surface reflectance

We create a cartopy coordinate reference system (EPSG:32611) that we will be using later on in this notebook:

assert landsat_5_img.rio.crs == landsat_8_img.rio.crs
print(landsat_5_img.rio.crs)
crs = ccrs.epsg(landsat_5_img.rio.crs.to_epsg())

EPSG:32611

Computing the NDVI (1988)#

Now let us compute the NDVI for the 1988 image.

ndvi5 = (landsat_5_img.sel(band=5) - landsat_5_img.sel(band=4))/(landsat_5_img.sel(band=5) + landsat_5_img.sel(band=4))
client.persist(ndvi5)

<xarray.DataArray (y: 7241, x: 7961)> Size: 231MB
dask.array<truediv, shape=(7241, 7961), dtype=float32, chunksize=(1200, 1200), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05
  * y            (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
    spatial_ref  int64 8B 0

Computing the NDVI (2017)#

Now we can do this for the Landsat 8 files for the 2017 image:

ndvi8 = (landsat_8_img.sel(band=5) - landsat_8_img.sel(band=4))/(landsat_8_img.sel(band=5) + landsat_8_img.sel(band=4))

Resampling to same size#

The two images share the same coordinate system but do not have the exact same dimensions or coordinates. Previous versions of this notebook resampled the images to the same size, and optionally allowed to regrid them, all using Datashader. In this version, we now interpolate the Landsat-8 image to fit onto the coordinates of the Landsat-5 one using xarray, approach that provides a similar result.

ndvi8 = ndvi8.interp_like(ndvi5, method="nearest")
client.persist(ndvi8)

<xarray.DataArray (y: 7241, x: 7961)> Size: 231MB
dask.array<transpose, shape=(7241, 7961), dtype=float32, chunksize=(7241, 7961), chunktype=numpy.ndarray>
Coordinates:
    spatial_ref  int64 8B 0
  * x            (x) float64 64kB 2.424e+05 2.424e+05 ... 4.812e+05 4.812e+05
  * y            (y) float64 58kB 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06

Computing statistics and projecting display#

The rest of the notebook shows how statistical operations can reduce the dimensionality of the data that may be used to compute new features that may be used as part of an ML pipeline.

The mean and sum over the two time points#

The next plot (may take a minute to compute) shows the mean of the two NDVI images next to the sum of them:

mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'], 
                    vdims=['ndvi']).relabel('Mean over Year')

summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'], 
                        vdims=['ndvi']).relabel('Sum over Year')

%%time
display(rasterize(mean_img) + rasterize(summed_image))

CPU times: user 15.5 s, sys: 1.48 s, total: 17 s
Wall time: 44.5 s

Difference in NDVI between 1988 and 2017#

The change in Walker Lake as viewed using the NDVI can be shown by subtracting the NDVI recorded in 1988 from the NDVI recorded in 2017:

diff = hmap['1988'].data - hmap['2017'].data
difference = gv.Image(diff, crs=crs, kdims=['lon', 'lat'], vdims=['ndvi'])
difference = difference.relabel('Difference in NDVI').redim(ndvi='delta_ndvi')

%%time
display(rasterize(difference).redim.range(delta_ndvi=(-1.0,1.0)).opts(cmap=coolwarm))

CPU times: user 1.87 s, sys: 105 ms, total: 1.98 s
Wall time: 3.63 s

You can see a large change (positive delta) in the areas where there is water, indicating a reduction in the size of the lake over this time period.

Slicing across `lon` and `lat`#

As a final example, we can use the sample method to slice across the difference in NDVI along (roughly) the midpoint of the latitude and the midpoint of the longitude. To do this, we define the following helper function to convert latitude/longitude into the appropriate coordinate value used by the DataSet:

def from_lon_lat(x,y):
    return crs.transform_point(x,y, ccrs.PlateCarree())

lon_y, lat_x = from_lon_lat(-118, 39)  # Longitude of -118 and Latitude of 39

%%time
display((difference.sample(lat=lat_x) + difference.sample(lon=lon_y)).cols(1))

CPU times: user 374 ms, sys: 12.5 ms, total: 386 ms
Wall time: 479 ms

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.