EO-LINCS project: Cube generation for Scientific Case Study (SCS) 2¶

Forest recovery post disturbance¶

Objective: The SCS2 aims to estimate canopy height and above-ground biomass maps using deep learning methods based on remote sensing data. The canopy height and above-ground biomass maps are subsequently used to calculate recovery curves, which in turn can be used to estimate carbon budget.

Outcomes: New high-resolution height/biomass maps that are expected to enable the monitoring of biomass at finer scales, in particular the impact of fine scale forest disturbances due to management practices such as thinning and the impact of natural disturbances (insects attacks, droughts, fires and windthrown in regions of interest). Analysis of forest recovery depending on environmental factors (such as climate, soil composition and pH) and the nature and intensity of disturbance shall aid in the optimization of forest management considering potential increased future disturbances.

Required Datasets:

The following notebook shows how the users can load data from various sources defined in scs2_config.yml using the MultiSourceDataStore tool.

What You Can Do with This Notebook¶

Load datasets from various sources as defined in the scs2_config.yml
View the progress of each data request to the MultiSourceDataStore
Quickly preview the datasets by downsampling and plotting them.

Requirements¶

Before you begin, ensure that all necessary dependencies are installed and that you have generated API token credentials for CLMS:

Install xcube-multistore via conda-forge by running: conda install --channel conda-forge xcube-multistore
Generate CLMS API token credentials by following the offical instructions. Save the credentials in a file named clms-credentials.json, placing it in the same directory as this notebook. Note that the filename can be customized in the scs2_config.yml configuration file.

Once you have it installed, you are ready to proceed.

This Multistore mainly works with a file called scs2_config.yml which is at the same file level as this notebook. To understand what goes into the schema, you can read more here.

Let's import the MultiSourceDataStore

In [1]:

Copied!

from xcube_multistore.multistore import MultiSourceDataStore
from xcube_multistore.multistore import MultiSourceDataStore

You can find out how to fill out the config file by also using this super helpful function get_config_schema(). Run it and try expand the fields to learn more about the possible properties that the configuration file accepts along with the Configuration Guide.

In [2]:

Copied!

MultiSourceDataStore.get_config_schema()
MultiSourceDataStore.get_config_schema()

Out[2]:

<xcube.util.jsonschema.JsonObjectSchema at 0x7234bc3bdb50>

Now, we can initialize the MultiSourceDataStore by passing the path to the scs2_config.yml which currently is on the same level as this notebook.

By running the cell below, you would start seeing progress tables for each data that you requested in the scs2_config.yml.

NOTE: In the scs2_config.yml we are also using the custom_processing feature of this tool that allows us to run a function for processing each dataset separately. In this example, we have defined a module called modify_dataset that does some custom processing which takes a xarray.Dataset argument as input and returns a new xarray.Dataset object. To read more about this custom_processing function, you can see more here.

In [3]:

Copied!

msds = MultiSourceDataStore("scs2_config.yml")
msds = MultiSourceDataStore("scs2_config.yml")

Preload Datasets from store 'zenodo_senf'

Dataset identifier	Status	Message	Exception
france.zip	STOPPED	Already preloaded.	-

Preload Datasets from store 'clms'

Dataset identifier	Status	Message	Exception
tree-cover-density-2018\|TCD_2018_010m_fr_03035_v020	STOPPED	Already preloaded.	-
dominant-leaf-type-2018\|DLT_2018_010m_fr_03035_v020	STOPPED	Already preloaded.	-
forest-type-2018\|FTY_2018_010m_fr_03035_v010	STOPPED	Already preloaded.	-

Cube Generation

Dataset identifier	Status	Message	Exception
clms	STOPPED	Dataset 'clms' finished.	-
senf	STOPPED	Dataset 'senf' already generated.	-
gami	STOPPED	Dataset 'gami' finished.	-
liu	STOPPED	Dataset 'liu' already generated.	-

We can now open the data using the xcube datastore framework API as usual. Note that the multi-source data store requires a data store called storage, which is configured in our scs2_config.yml under the data_stores section.

In [4]:

Copied!

ds = msds.stores.storage.open_data("gami.zarr")
ds
ds = msds.stores.storage.open_data("gami.zarr")
ds

Out[4]:

<xarray.Dataset> Size: 3GB
Dimensions:      (members: 20, time: 2, latitude: 5199, longitude: 7000)
Coordinates:
  * latitude     (latitude) float64 42kB 44.8 44.8 44.8 44.8 ... 43.5 43.5 43.5
  * longitude    (longitude) float64 56kB -1.25 -1.25 -1.249 ... 0.4996 0.4999
  * members      (members) int64 160B 0 1 2 3 4 5 6 7 ... 13 14 15 16 17 18 19
    spatial_ref  int64 8B ...
  * time         (time) datetime64[ns] 16B 2010-01-01 2020-01-01
Data variables:
    forest_age   (members, time, latitude, longitude) int16 3GB dask.array<chunksize=(1, 1, 1000, 1000), meta=np.ndarray>
Attributes: (12/21)
    _FillValue:                 -9999
    contact:                    Simon Besnard (besnard@gfz.de) and Nuno Carva...
    created_by:                 Simon Besnard
    creation_date:              2025-01-31 16:52
    date_modified:              2025-03-25T07:07:14.530681
    frequency:                  2010 and 2020
    ...                         ...
    geospatial_lon_units:       degrees_east
    institute_id:               GFZ-Potsdam and MPI-BGC
    institution:                Helmholtz Center Potsdam GFZ German Research ...
    product_name:               Global Age Mapping Integration (GAMI) v2.1
    references:                 https://doi.org/10.5880/GFZ.1.4.2023.006 and ...
    version:                    v2.1

We can now select a variable for one timestep, downsample and plot it for a quick preview of the data

In [5]:

Copied!

ds["forest_age"].isel(members=0, time=0)[::5, ::5].plot(vmin=0, vmax=120)
ds["forest_age"].isel(members=0, time=0)[::5, ::5].plot(vmin=0, vmax=120)

Out[5]:

<matplotlib.collections.QuadMesh at 0x7234b1271fd0>

No description has been provided for this image

In [6]:

Copied!

ds["forest_age"].isel(members=0, time=1)[::5, ::5].plot(vmin=0, vmax=120)
ds["forest_age"].isel(members=0, time=1)[::5, ::5].plot(vmin=0, vmax=120)

Out[6]:

<matplotlib.collections.QuadMesh at 0x7234b06dfa10>

In [7]:

Copied!

ds = msds.stores.storage.open_data("clms.zarr")
ds
ds = msds.stores.storage.open_data("clms.zarr")
ds

Out[7]:

<xarray.Dataset> Size: 874MB
Dimensions:                  (lat: 5199, lon: 7000)
Coordinates:
  * lat                      (lat) float64 42kB 44.8 44.8 44.8 ... 43.5 43.5
  * lon                      (lon) float64 56kB -1.25 -1.25 ... 0.4996 0.4999
Data variables:
    dominant_leaf_type_2018  (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    forest_type_2018         (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tree_cover_density_2018  (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
Attributes: (12/13)
    AREA_OR_POINT:              Area
    DataType:                   Thematic
    date_modified:              2025-03-25T07:00:35.818524
    geospatial_bounds:          POLYGON((-1.2164245320441032 43.3253275548571...
    geospatial_bounds_crs:      CRS84
    geospatial_lat_max:         44.97163405364227
    ...                         ...
    geospatial_lat_resolution:  0.00010168745379246502
    geospatial_lat_units:       degrees_north
    geospatial_lon_max:         0.4708656240470786
    geospatial_lon_min:         -1.2164245320441032
    geospatial_lon_resolution:  0.00010748994246473353
    geospatial_lon_units:       degrees_east

In [8]:

Copied!

ds["dominant_leaf_type_2018"][::5, ::5].plot(vmax=3)
ds["dominant_leaf_type_2018"][::5, ::5].plot(vmax=3)

Out[8]:

<matplotlib.collections.QuadMesh at 0x7234b06a88c0>

In [9]:

Copied!

ds["forest_type_2018"][::5, ::5].plot(vmax=3)
ds["forest_type_2018"][::5, ::5].plot(vmax=3)

Out[9]:

<matplotlib.collections.QuadMesh at 0x7234b1d4b320>

In [10]:

Copied!

ds["tree_cover_density_2018"][::5, ::5].plot(vmax=100)
ds["tree_cover_density_2018"][::5, ::5].plot(vmax=100)

Out[10]:

<matplotlib.collections.QuadMesh at 0x7234b04ec710>

In [11]:

Copied!

ds = msds.stores.storage.open_data("liu.zarr")
ds
ds = msds.stores.storage.open_data("liu.zarr")
ds

Out[11]:

<xarray.Dataset> Size: 874MB
Dimensions:        (lat: 5199, lon: 7000)
Coordinates:
  * lat            (lat) float64 42kB 44.8 44.8 44.8 44.8 ... 43.5 43.5 43.5
  * lon            (lon) float64 56kB -1.25 -1.25 -1.249 ... 0.4996 0.4999
Data variables:
    agb            (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    canopy_cover   (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    canopy_height  (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
Attributes:
    date_modified:              2025-03-24T14:02:25.086565
    geospatial_bounds:          POLYGON((-18.9594916134861 50.18279655877187,...
    geospatial_bounds_crs:      CRS84
    geospatial_lat_max:         52.03426577988996
    geospatial_lat_min:         50.18279655877187
    geospatial_lat_resolution:  0.00034892335219183224
    geospatial_lat_units:       degrees_north
    geospatial_lon_max:         -17.63688150306111
    geospatial_lon_min:         -18.9594916134861
    geospatial_lon_resolution:  0.0002391080323462802
    geospatial_lon_units:       degrees_east

In [12]:

Copied!

ds["agb"][::5, ::5].plot(vmax=20000)
ds["agb"][::5, ::5].plot(vmax=20000)

Out[12]:

<matplotlib.collections.QuadMesh at 0x7234b0426690>

In [13]:

Copied!

ds["canopy_cover"][::5, ::5].plot()
ds["canopy_cover"][::5, ::5].plot()

Out[13]:

<matplotlib.collections.QuadMesh at 0x7234b02d3320>

In [14]:

Copied!

ds["canopy_height"][::5, ::5].plot()
ds["canopy_height"][::5, ::5].plot()

Out[14]:

<matplotlib.collections.QuadMesh at 0x7234b0fde240>

In [15]:

Copied!

ds = msds.stores.storage.open_data("senf.zarr")
ds
ds = msds.stores.storage.open_data("senf.zarr")
ds

Out[15]:

<xarray.Dataset> Size: 47GB
Dimensions:                       (time: 39, lat: 5199, lon: 7000)
Coordinates:
  * lat                           (lat) float64 42kB 44.8 44.8 ... 43.5 43.5
  * lon                           (lon) float64 56kB -1.25 -1.25 ... 0.4999
  * time                          (time) datetime64[ns] 312B 1985-01-01 ... 2...
Data variables:
    annual_disturbances           (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray>
    disturbance_agent             (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray>
    disturbance_agent_aggregated  (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    disturbance_probability       (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray>
    disturbance_severity          (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray>
    forest_mask                   (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    greatest_disturbance          (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    latest_disturbance            (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    number_disturbances           (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray>
Attributes:
    date_modified:              2025-03-25T06:53:43.878410
    geospatial_bounds:          POLYGON((-18.95915690343474 50.18273187952879...
    geospatial_bounds_crs:      CRS84
    geospatial_lat_max:         52.03419759001479
    geospatial_lat_min:         50.182731879528795
    geospatial_lat_resolution:  0.0003489227219688473
    geospatial_lat_units:       degrees_north
    geospatial_lon_max:         -17.6365344971688
    geospatial_lon_min:         -18.95915690343474
    geospatial_lon_resolution:  0.00023911036652535245
    geospatial_lon_units:       degrees_east

In [16]:

Copied!

ds["annual_disturbances"].isel(time=10)[::5, ::5].plot()
ds["annual_disturbances"].isel(time=10)[::5, ::5].plot()

Out[16]:

<matplotlib.collections.QuadMesh at 0x7234b0559790>

In [17]:

Copied!

ds["disturbance_probability"].isel(time=10)[::5, ::5].plot()
ds["disturbance_probability"].isel(time=10)[::5, ::5].plot()

Out[17]:

<matplotlib.collections.QuadMesh at 0x7234b0cbb320>