Heat and Trees#

hvplotgeoviewsdatashaderbokehdaskgeopandasxarray

Published: October 1, 2018 · Modified: October 28, 2024

Urban Heat Islands and Street Trees#

In this notebook we’ll be exploring the urban heat island effect by looking at the impact on surface temperature of roof color and street trees. We’ll be replicating the process described here but using Python tools rather than ESRI.

Data sources: This notebook uses Landsat data from Google Cloud Storage as well as some geographic data from OpenDataPhilly.

import intake
import xarray as xr
import pandas as pd
import numpy as np
import geopandas as gpd
import cartopy.crs as ccrs
import hvplot.xarray  # noqa
import hvplot.pandas  # noqa

from geoviews.tile_sources import EsriImagery
from pyproj import CRS

Just some extra info about Landsat data:

band_info = pd.DataFrame([
        (1,  "Aerosol", " 0.43 - 0.45",    0.440,  "30",         "Coastal aerosol"),
        (2,  "Blue",    " 0.45 - 0.51",    0.480,  "30",         "Blue"),
        (3,  "Green",   " 0.53 - 0.59",    0.560,  "30",         "Green"),
        (4,  "Red",     " 0.64 - 0.67",    0.655,  "30",         "Red"),
        (5,  "NIR",     " 0.85 - 0.88",    0.865,  "30",         "Near Infrared (NIR)"),
        (6,  "SWIR1",   " 1.57 - 1.65",    1.610,  "30",         "Shortwave Infrared (SWIR) 1"),
        (7,  "SWIR2",   " 2.11 - 2.29",    2.200,  "30",         "Shortwave Infrared (SWIR) 2"),
        (8,  "Panc",    " 0.50 - 0.68",    0.590,  "15",         "Panchromatic"),
        (9,  "Cirrus",  " 1.36 - 1.38",    1.370,  "30",         "Cirrus"),
        (10, "TIRS1",   "10.60 - 11.19",   10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
        (11, "TIRS2",   "11.50 - 12.51",   12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
    columns=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
band_info

	Name	Wavelength Range (µm)	Nominal Wavelength (µm)	Resolution (m)	Description
Band
1	Aerosol	0.43 - 0.45	0.440	30	Coastal aerosol
2	Blue	0.45 - 0.51	0.480	30	Blue
3	Green	0.53 - 0.59	0.560	30	Green
4	Red	0.64 - 0.67	0.655	30	Red
5	NIR	0.85 - 0.88	0.865	30	Near Infrared (NIR)
6	SWIR1	1.57 - 1.65	1.610	30	Shortwave Infrared (SWIR) 1
7	SWIR2	2.11 - 2.29	2.200	30	Shortwave Infrared (SWIR) 2
8	Panc	0.50 - 0.68	0.590	15	Panchromatic
9	Cirrus	1.36 - 1.38	1.370	30	Cirrus
10	TIRS1	10.60 - 11.19	10.895	100 * (30)	Thermal Infrared (TIRS) 1
11	TIRS2	11.50 - 12.51	12.005	100 * (30)	Thermal Infrared (TIRS) 2

Loading data#

For this example, we will be using landsat data stored on Google Cloud Storage. Since these data are accessed via https, there is no guaranteed directory structure, so we will need to specify the url pointing to each file and then iterate over the files to create a concatenated dataset. We use jinja template notation in intake to pass parameters to the urlpath.

cat = intake.open_catalog('catalog.yml')
list(cat)

['google_landsat_band',
 'neighborhoods_philadelphia',
 'ppr_tree_inventory_2022']

Let’s take a look at what the google_landsat_band looks like:

google_landsat_band:
    description: Landsat bands from Google Cloud Storage
    driver: rasterio
    parameters:
      path:
        description: landsat path
        type: int
      row:
        description: landsat row
        type: int
      product_id:
        description: landsat file id
        type: str
      band:
        description: band
        type: int
    args:
      urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
      chunks:
        band: 1
        x: 256
        y: 256

The following might feel a bit arbitrary, but we have chosen the path and row corresponding to the area over Philadelphia using the earth explorer. We have also found the id of the particular date of interest using the same tool. With these values in hand, we can access parts of each file on Google Cloud Storage.

path = 14
row = 32
product_id = 'LC08_L1TP_014032_20160727_20170222_01_T1'

The first step to using intake is to initialize the catalog entry with user parameters to create a data source.

data_source = cat.google_landsat_band(path=path, row=row, product_id=product_id)

From this data source we can get a lazily loaded xarray object using dask. To make sure that we can inspect what dask is up to, it can be helpful to create a dask client.

ds = data_source.to_dask()
ds.name = 'value'

/home/runner/work/examples/examples/heat_and_trees/envs/default/lib/python3.11/site-packages/intake_xarray/raster.py:107: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(ds2.dims),

Loading in metadata regarding these particular Landsat images from the associated matlab.txt file.

def load_google_landsat_metadata(path, row, product_id):
    """Load Landsat metadata for path, row, product_id from Google Cloud Storage
    """
    def parse_type(x):
        try: 
            return eval(x)
        except:
            return x
    
    baseurl = 'https://storage.googleapis.com/gcp-public-data-landsat/LC08/01'
    suffix = f'{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt'
    
    df = intake.open_csv(
        urlpath=f'{baseurl}/{suffix}',
        csv_kwargs={'sep': '=',
                    'header': None,
                    'names': ['index', 'value'],
                    'skiprows': 2,
                    'converters': {'index': (lambda x: x.strip()),
                                   'value': parse_type}}).read()
    metadata = df.set_index('index')['value']
    return metadata

metadata = load_google_landsat_metadata(path, row, product_id)
metadata.head()

index
ORIGIN                Image courtesy of the U.S. Geological Survey
REQUEST_ID                                     0501702206266_00020
LANDSAT_SCENE_ID                             LC80140322016209LGN01
LANDSAT_PRODUCT_ID        LC08_L1TP_014032_20160727_20170222_01_T1
COLLECTION_NUMBER                                               01
Name: value, dtype: object

Sub-setting to area of interest#

So far we haven’t downloaded any band data. Since we know that we are interested in Philadelphia, we can just take a smaller square of data that covers the extents of the city. First we need to know the projection of the dataset:

ds.rio.crs

CRS.from_epsg(32618)

We’ll convert that into something directly usable for later:

proj_crs= CRS.from_epsg(32618)
proj_crs

<Projected CRS: EPSG:32618>
Name: WGS 84 / UTM zone 18N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 78°W and 72°W, northern hemisphere between equator and 84°N, onshore and offshore. Bahamas. Canada - Nunavut; Ontario; Quebec. Colombia. Cuba. Ecuador. Greenland. Haiti. Jamaica. Panama. Turks and Caicos Islands. United States (USA). Venezuela.
- bounds: (-78.0, 0.0, -72.0, 84.0)
Coordinate Operation:
- name: UTM zone 18N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

crs = ccrs.UTM(zone=18)

Now if we were just looking for one particular point we could use that point, converted to the coordinate system of the data, and then select the data nearest to it:

x_center, y_center = crs.transform_point(-75.1652, 39.9526, ccrs.PlateCarree())
nearest_to_center = ds.sel(x=x_center, y=y_center, method='nearest')
print(nearest_to_center.compute())

nearest_to_center.hvplot.line(x='band')

<xarray.DataArray 'value' (band: 4)> Size: 8B
array([11465, 12774, 30347, 26335], dtype=uint16)
Coordinates:
    x            float64 8B 4.859e+05
    y            float64 8B 4.423e+06
    spatial_ref  int64 8B 0
  * band         (band) int64 32B 4 5 10 11
Attributes:
    AREA_OR_POINT:  Point
    scale_factor:   1.0
    add_offset:     0.0

In this case, though, we are interested in a subset of data that covers that city of Philadelphia. So we need some geometry to specify the bounds of the city. We can get a GeoJSON of neighborhood data from OpenDataPhilly.

geoms = gpd.GeoDataFrame.from_features(cat.neighborhoods_philadelphia.read(), crs=4326)

bounds = geoms.geometry.bounds
lower_left_corner_lat_lon = bounds.minx.min(), bounds.miny.min()
upper_right_corner_lat_lon = bounds.maxx.max(), bounds.maxy.max()

print(lower_left_corner_lat_lon, upper_right_corner_lat_lon)

(np.float64(-75.280266), np.float64(39.867004)) (np.float64(-74.955763), np.float64(40.137992))

Using the crs defined above, we can transform these lat/lons into map coordinates.

ll_corner = crs.transform_point(*lower_left_corner_lat_lon, ccrs.PlateCarree())
ur_corner = crs.transform_point(*upper_right_corner_lat_lon, ccrs.PlateCarree())

print(ll_corner, ur_corner)

(np.float64(476030.21311048063), np.float64(4413033.712613887)) (np.float64(503768.4451229614), np.float64(4443074.1017383))

Then we can use those corners to slice the data. If the subset is empty along x or y, the ordering of the coordinates might not be what you anticipated. Try swapping the order of arguments in the slice.

subset = ds.sel(x=slice(ll_corner[0], ur_corner[0]), y=slice(ur_corner[1], ll_corner[1]))

We can persist this slice of the dataset in memory for easy use later.

subset = subset.persist()
subset

<xarray.DataArray 'value' (band: 4, y: 1001, x: 925)> Size: 7MB
dask.array<getitem, shape=(4, 1001, 925), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x            (x) float64 7kB 4.76e+05 4.761e+05 ... 5.037e+05 5.038e+05
  * y            (y) float64 8kB 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06
    spatial_ref  int64 8B 0
  * band         (band) int64 32B 4 5 10 11
Attributes:
    AREA_OR_POINT:  Point
    scale_factor:   1.0
    add_offset:     0.0

To check that we got the right area, we can do a simple plot of one of the bands and overlay the neighborhoods on top of it. We’ll use hvplot to quickly create a holoviews object rendered in bokeh.

geoms.head()

	geometry	name	listname	mapname	shape_leng	shape_area	cartodb_id	created_at	updated_at
0	MULTIPOLYGON (((-75.05646 40.08743, -75.05667 ...	PENNYPACK_PARK	Pennypack Park	Pennypack Park	87084.285589	6.014076e+07	9	2013-03-19T17:41:50.508Z	2013-03-19T17:41:50.743Z
1	MULTIPOLYGON (((-75.2272 39.9774, -75.22984 39...	OVERBROOK	Overbrook	Overbrook	57004.924607	7.692499e+07	138	2013-03-19T17:41:50.508Z	2013-03-19T17:41:50.743Z
2	MULTIPOLYGON (((-75.16208 40.02829, -75.16145 ...	GERMANTOWN_SOUTHWEST	Germantown, Southwest	Southwest Germantown	14880.743608	1.441867e+07	59	2013-03-19T17:41:50.508Z	2013-03-19T17:41:50.743Z
3	MULTIPOLYGON (((-75.19931 39.97462, -75.19869 ...	EAST_PARKSIDE	East Parkside	East Parkside	10885.781535	4.231000e+06	129	2013-03-19T17:41:50.508Z	2013-03-19T17:41:50.743Z
4	MULTIPOLYGON (((-75.22722 40.03523, -75.22865 ...	GERMANY_HILL	Germany Hill	Germany Hill	13041.939087	6.949968e+06	49	2013-03-19T17:41:50.508Z	2013-03-19T17:41:50.743Z

band_plot = subset.mean('band').hvplot(x='x', y='y', rasterize=True, crs=crs, cnorm='eq_hist', cmap='gray')
hood_plot = geoms.hvplot(geo=True, alpha=.5, c='mapname', legend=False, frame_height=450)

band_plot * hood_plot

Calculate NDVI#

We’ll calculate NDVI but we won’t yet do any computations – our bands are actually dask arrays, which allow for lazy computation.

In order to visualize NDVI, the data will need to be loaded and the NDVI computed.

NDVI.hvplot(x='x', y='y', crs=crs, rasterize=True, cmap='viridis', frame_height=450)

Adding in the Street Tree data#

OpenDataPhilly released an inventory of all the street trees in the city. Street trees are trees that are planted along streets, not those in parks and private property. The original analysis considered these 100,000 points too many to plot, but that’s nothing to datashader, which is happy with billions of points even on a laptop.

It is hypothesized that where there are more trees the land surface temperature will be less extreme. To explore this, we will overlay street trees with the thresholded land surface temperature:

trees_gdf = gpd.GeoDataFrame.from_features(cat.ppr_tree_inventory_2022.read(), crs=4326)
trees_gdf.head()

	geometry	OBJECTID	TREE_NAME	TREE_DBH	YEAR	LOC_Y	LOC_X
0	POINT (-75.2105 39.98383)	1	GINKGO BILOBA - GINKGO	36.0	2022	39.983831	-75.210503
1	POINT (-75.21053 39.98374)	2	ACER PALMATUM - JAPANESE MAPLE	NaN	2022	39.983742	-75.210533
2	POINT (-75.21041 39.98376)	3	ACER PALMATUM - JAPANESE MAPLE	8.0	2022	39.983757	-75.210407
3	POINT (-75.2106 39.98395)	4	ACER PALMATUM - JAPANESE MAPLE	12.0	2022	39.983951	-75.210604
4	POINT (-75.21028 39.98376)	5	ACER PSEUDOPLATANUS - SYCAMORE MAPLE	17.0	2022	39.983762	-75.210275

tree_p = trees_gdf.hvplot.points('LOC_X', 'LOC_Y', title='Street Tree Density',
                                geo=True, rasterize=True, dynspread=True, max_px=5,
                                 threshold=0.9, frame_height=450, cmap='Greens', cnorm='eq_hist')

thresholded_temp_p.opts(alpha=.5) * tree_p.opts(alpha=.5)

Here, we use the dynspread parameter that helps us to dynamically calculate the points to display based on the local density of the points. This is more evident when you zoom into a sparse region of the plot. We also set the threshold parameter to control the fraction of the maximum density at which spreading starts, as well as max_px which sets the maximum number of pixels by which points can be spread.

See datashader.spreading and dynspread for more details.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.