EO-LINCS project: Cube generation for Scientific Case Study (SCS) 4¶
EO enhanced benchmarking of GCB DGVMs¶
Objective: SCS4 aims to deepen the understanding of the processes that drive the European land carbon sink, with a focus on productivity, turnover, and the impacts of disturbances and land management. Leveraging new EO data and the International Land Model Benchmarking (ILAMB) system, it will assess Dynamic Global Vegetation Models (DGVMs) that contribute to the Global Carbon Budget (GCB) reports. The project will result in an enhanced ILAMB tool, offering insights into carbon dynamics and DGVM performance, and providing a roadmap for future model improvements.
Outcomes: An enhanced ILAMB evaluation tool with a focus on internal carbon dynamics and temporal change able to provide novel insights into DGVM capabilities to simulate the European land carbon sink and identify its main drivers. The spatiotemporal analysis will enable us to produce a roadmap for model improvements, in particular regarding forest management
Requried datasets:
The following notebook shows how the users can load data from various sources defined in scs4_config.yml
using the MultiSourceDataStore
tool.
What You Can Do with This Notebook¶
- Load datasets from various sources as defined in the
scs4_config.yml
- View the progress of each data request to the
MultiSourceDataStore
- Quickly preview the datasets by plotting them.
Requirements¶
Before proceeding, ensure you have the necessary dependencies installed:
- install
xcube-multistore
by executing:conda install --channel conda-forge xcube-multistore
Once you have it installed, you are ready to proceed.
This Multistore mainly works with a file called scs4_config.yml
which is at the same file level as this notebook.
To understand what goes into the schema, you can read more here.
Let's import the MultiSourceDataStore
from xcube_multistore.multistore import MultiSourceDataStore
You can find out how to fill out the config file by also using this super helpful function get_config_schema()
. Run it and try expand the fields to learn more about the possible properties that the configuration file accepts along with the Configuration Guide.
MultiSourceDataStore.get_config_schema()
<xcube.util.jsonschema.JsonObjectSchema at 0x78d0d85def00>
Now, we can initialize the MultiSourceDataStore
by passing the path to the scs4_config.yml
which currently is on the same level as this notebook.
By running the cell below, you would start seeing progress tables for each data that you requested in the scs4_config.yml
.
NOTE: In the scs4_config.yml
we are also using the custom_processing
feature of this tool that allows us to run a function for processing each dataset separately. In this example, we have defined a module called modify_dataset
that does some custom processing which takes a xarray.Dataset
argument as input and returns a new xarray.Dataset
object. To read more about this custom_processing
function, you can see more here.
msds = MultiSourceDataStore("scs4_config.yml")
<frozen abc>:106: FutureWarning: xarray subclass VectorDataCube should explicitly define __slots__
Dataset identifier | Status | Message | Exception |
---|---|---|---|
biomass_xu | STOPPED | Dataset 'biomass_xu' finished. | - |
esa_cci_biomass | STOPPED | Dataset 'esa_cci_biomass' finished. | - |
/home/konstantin/micromamba/envs/xcube-multistore/lib/python3.12/site-packages/xcube_cci/cciodp.py:2043: CciOdpWarning: Variable "crs" has no fill value, cannot set one. For parts where no data is available you will see random values. This is usually the case when data is missing for a time step. warnings.warn(f'Variable "{fixed_key}" has no fill value, '
We can now open the data using the xcube datastore framework API as usual. Note that the multi-source data store requires a data store called storage
, which is configured in our scs4_config.yml
under the data_stores
section.
ds = msds.stores.storage.open_data("biomass_xu.nc", chunks=dict(time=1))
ds
<xarray.Dataset> Size: 518MB Dimensions: (time: 20, lat: 1800, lon: 3600) Coordinates: * time (time) datetime64[ns] 160B 2000-01-01 ... 2019-01-01 * lon (lon) float64 29kB -179.9 -179.8 -179.8 ... 179.8 179.9 * lat (lat) float64 14kB 89.95 89.85 89.75 ... -89.85 -89.95 spatial_ref int64 8B ... Data variables: carbon_density (time, lat, lon) float32 518MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray> Attributes: source: https://zenodo.org/records/4161694/files/test... geospatial_lon_units: degrees_east geospatial_lon_min: -180 geospatial_lon_max: 179.90000000000003 geospatial_lon_resolution: 0.1 geospatial_lat_units: degrees_north geospatial_lat_min: -89.90000000000003 geospatial_lat_max: 90 geospatial_lat_resolution: 0.1 geospatial_bounds_crs: CRS84 geospatial_bounds: POLYGON((-180 -89.90000000000003, -180 90, 17... date_modified: 2025-03-25T07:13:08.888986
We can now select a variable for one timestep and plot it for a quick preview of the data
ds.carbon_density.isel(time=1).plot()
<matplotlib.collections.QuadMesh at 0x78d0cce4d160>
ds = msds.stores.storage.open_data("esa_cci_biomass.nc", chunks=dict(time=1))
ds
<xarray.Dataset> Size: 415MB Dimensions: (time: 8, lat: 1800, lon: 3600) Coordinates: * time (time) datetime64[ns] 64B 2010-01-01 2015-01-01 ... 2021-01-01 * lon (lon) float64 29kB -179.9 -179.8 -179.8 ... 179.8 179.8 179.9 * lat (lat) float64 14kB 89.95 89.85 89.75 ... -89.75 -89.85 -89.95 spatial_ref int64 8B ... Data variables: agb (time, lat, lon) float32 207MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray> agb_sd (time, lat, lon) float32 207MB dask.array<chunksize=(1, 1800, 3600), meta=np.ndarray> Attributes: (12/19) Conventions: CF-1.7 title: esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-p... date_created: 2025-03-25T07:13:49.750925 processing_level: L4 time_coverage_start: 2010-01-01T00:00:00 time_coverage_end: 2021-01-01T00:00:00 ... ... geospatial_lat_min: -89.9 geospatial_lat_max: 90 geospatial_lat_resolution: 0.1 geospatial_bounds_crs: CRS84 geospatial_bounds: POLYGON((-180 -89.9, -180 90, 179.90000000000... date_modified: 2025-03-25T07:13:49.851800
ds.agb.isel(time=0).plot()
<matplotlib.collections.QuadMesh at 0x78d0ccd1fd10>
ds.agb_sd.isel(time=0).plot()
<matplotlib.collections.QuadMesh at 0x78d0ccc673e0>