EO-LINCS project: Cube generation for Scientific Case Study (SCS) 4¶

EO enhanced benchmarking of GCB DGVMs¶

Objective: SCS4 aims to deepen the understanding of the processes that drive the European land carbon sink, with a focus on productivity, turnover, and the impacts of disturbances and land management. Leveraging new EO data and the International Land Model Benchmarking (ILAMB) system, it will assess Dynamic Global Vegetation Models (DGVMs) that contribute to the Global Carbon Budget (GCB) reports. The project will result in an enhanced ILAMB tool, offering insights into carbon dynamics and DGVM performance, and providing a roadmap for future model improvements.

Outcomes: An enhanced ILAMB evaluation tool with a focus on internal carbon dynamics and temporal change able to provide novel insights into DGVM capabilities to simulate the European land carbon sink and identify its main drivers. The spatiotemporal analysis will enable us to produce a roadmap for model improvements, in particular regarding forest management

Requried datasets:

The following notebook shows how the users can load data from various sources defined in scs4_config.yml using the MultiSourceDataStore tool.

What You Can Do with This Notebook¶

Load datasets from various sources as defined in the scs4_config.yml
View the progress of each data request to the MultiSourceDataStore
Quickly preview the datasets by plotting them.

Requirements¶

Before proceeding, ensure you have the necessary dependencies installed:

install xcube-multistore by executing: conda install --channel conda-forge xcube-multistore

Once you have it installed, you are ready to proceed.

This Multistore mainly works with a file called scs4_config.yml which is at the same file level as this notebook. To understand what goes into the schema, you can read more here.

Let's import the MultiSourceDataStore

In [1]:

Copied!

from xcube_multistore.multistore import MultiSourceDataStore
from xcube_multistore.multistore import MultiSourceDataStore

You can find out how to fill out the config file by also using this super helpful function get_config_schema(). Run it and try expand the fields to learn more about the possible properties that the configuration file accepts along with the Configuration Guide.

In [2]:

Copied!

MultiSourceDataStore.get_config_schema()
MultiSourceDataStore.get_config_schema()

Out[2]:

<xcube.util.jsonschema.JsonObjectSchema at 0x78d0d85def00>

Now, we can initialize the MultiSourceDataStore by passing the path to the scs4_config.yml which currently is on the same level as this notebook.

By running the cell below, you would start seeing progress tables for each data that you requested in the scs4_config.yml.

NOTE: In the scs4_config.yml we are also using the custom_processing feature of this tool that allows us to run a function for processing each dataset separately. In this example, we have defined a module called modify_dataset that does some custom processing which takes a xarray.Dataset argument as input and returns a new xarray.Dataset object. To read more about this custom_processing function, you can see more here.

In [3]:

Copied!

msds = MultiSourceDataStore("scs4_config.yml")
msds = MultiSourceDataStore("scs4_config.yml")

<frozen abc>:106: FutureWarning: xarray subclass VectorDataCube should explicitly define __slots__

Cube Generation

Dataset identifier	Status	Message	Exception
biomass_xu	STOPPED	Dataset 'biomass_xu' finished.	-
esa_cci_biomass	STOPPED	Dataset 'esa_cci_biomass' finished.	-

/home/konstantin/micromamba/envs/xcube-multistore/lib/python3.12/site-packages/xcube_cci/cciodp.py:2043: CciOdpWarning: Variable "crs" has no fill value, cannot set one. For parts where no data is available you will see random values. This is usually the case when data is missing for a time step.
  warnings.warn(f'Variable "{fixed_key}" has no fill value, '

We can now open the data using the xcube datastore framework API as usual. Note that the multi-source data store requires a data store called storage, which is configured in our scs4_config.yml under the data_stores section.

In [4]:

Copied!

ds = msds.stores.storage.open_data("biomass_xu.nc", chunks=dict(time=1))
ds
ds = msds.stores.storage.open_data("biomass_xu.nc", chunks=dict(time=1))
ds

Out[4]:

<xarray.Dataset> Size: 518MB
Dimensions:         (time: 20, lat: 1800, lon: 3600)
Coordinates:
  * time            (time) datetime64[ns] 160B 2000-01-01 ... 2019-01-01
  * lon             (lon) float64 29kB -179.9 -179.8 -179.8 ... 179.8 179.9
  * lat             (lat) float64 14kB 89.95 89.85 89.75 ... -89.85 -89.95
    spatial_ref     int64 8B ...
Data variables:
    carbon_density  (time, lat, lon) float32 518MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray>
Attributes:
    source:                     https://zenodo.org/records/4161694/files/test...
    geospatial_lon_units:       degrees_east
    geospatial_lon_min:         -180
    geospatial_lon_max:         179.90000000000003
    geospatial_lon_resolution:  0.1
    geospatial_lat_units:       degrees_north
    geospatial_lat_min:         -89.90000000000003
    geospatial_lat_max:         90
    geospatial_lat_resolution:  0.1
    geospatial_bounds_crs:      CRS84
    geospatial_bounds:          POLYGON((-180 -89.90000000000003, -180 90, 17...
    date_modified:              2025-03-25T07:13:08.888986

We can now select a variable for one timestep and plot it for a quick preview of the data

In [5]:

Copied!

ds.carbon_density.isel(time=1).plot()
ds.carbon_density.isel(time=1).plot()

Out[5]:

<matplotlib.collections.QuadMesh at 0x78d0cce4d160>

No description has been provided for this image

In [6]:

Copied!

ds = msds.stores.storage.open_data("esa_cci_biomass.nc", chunks=dict(time=1))
ds
ds = msds.stores.storage.open_data("esa_cci_biomass.nc", chunks=dict(time=1))
ds

Out[6]:

<xarray.Dataset> Size: 415MB
Dimensions:      (time: 8, lat: 1800, lon: 3600)
Coordinates:
  * time         (time) datetime64[ns] 64B 2010-01-01 2015-01-01 ... 2021-01-01
  * lon          (lon) float64 29kB -179.9 -179.8 -179.8 ... 179.8 179.8 179.9
  * lat          (lat) float64 14kB 89.95 89.85 89.75 ... -89.75 -89.85 -89.95
    spatial_ref  int64 8B ...
Data variables:
    agb          (time, lat, lon) float32 207MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray>
    agb_sd       (time, lat, lon) float32 207MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray>
Attributes: (12/19)
    Conventions:                CF-1.7
    title:                      esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-p...
    date_created:               2025-03-25T07:13:49.750925
    processing_level:           L4
    time_coverage_start:        2010-01-01T00:00:00
    time_coverage_end:          2021-01-01T00:00:00
    ...                         ...
    geospatial_lat_min:         -89.9
    geospatial_lat_max:         90
    geospatial_lat_resolution:  0.1
    geospatial_bounds_crs:      CRS84
    geospatial_bounds:          POLYGON((-180 -89.9, -180 90, 179.90000000000...
    date_modified:              2025-03-25T07:13:49.851800

In [7]:

Copied!

ds.agb.isel(time=0).plot()
ds.agb.isel(time=0).plot()

Out[7]:

<matplotlib.collections.QuadMesh at 0x78d0ccd1fd10>

In [8]:

Copied!

ds.agb_sd.isel(time=0).plot()
ds.agb_sd.isel(time=0).plot()

Out[8]:

<matplotlib.collections.QuadMesh at 0x78d0ccc673e0>