EO-LINCS project: Cube generation for Scientific Case Study (SCS) 1¶
Explanatory power of novel EO data streams for predicting net carbon fluxes¶
Objective: The SCS1 aims to link EO data streams to in situ data to predict carbon, water, and energy fluxes. The approach is based on the FLUXCOM-X methodologies, where meteorological and reflectance data from satellites are taken as input to train a machine learning model on the target flux. While the data extraction uses the FLUXCOM-X methodology as a test case, the pipeline is applicable to any use case which matches EO data to eddy covariance measurements. The provided test case is based on eddy covariance data as provided by FLUXNET and associated networks such as the Integrated Carbon Observation System (ICOS) and AmeriFLUX.
Outcomes: A working data processing chain able to incorporate new EO data, including Sentinel-2 and Sentinel-3 data, into the FLUXCOM-X framework that is updatable and expandable to all sites and other data products. The example case demonstrates the utility of these new data streams for predicting NEE and analysis into the added value with regards to interannual variability, drought responses, and disturbance.
Required datasets:
For this science case, we use small spatial cutouts around flux towers from the following datasets:
- Sentinel-2 L2A
- Sentinel-3 SYNERGY
- Sentinel-3 SLSTR Land Surfacce Temperature
- ERA5 hourly timeseries
- ESA CCI Biomass
- ESA CCI Biomass by EOForestSTAC
- Robinson et al. – Chapman-Richards growth-curve parameters for secondary-forest aboveground carbon dynamics
- Saatchi & Yu - 2020 Global aboveground biomass
- Hansen Global Forest Change (GFC) – Tree cover and annual loss/gain
- Global Age Mapping Integration (GAMI) – Global forest age ensemble
- Global Canopy Height (Potapov et al.)
- Global Ecosystem Dynamics Investigation (GEDI)
Overview of the Cube Generation Pipeline¶
This notebook demonstrates how users can access the required datasets via the xcube Multi-Source Data Store. The full example is available in the GitHub repository. All parameters controlling the cube generation workflow are defined in the configuration file config.yml.
Comprehensive documentation of the configuration schema is available on the documentation webpage. A step-by-step example showing how to set up a configuration file—while extracting information from multiple data stores—is provided here:
Setup Config YAML File.
In this notebook, we provide:
- A concise preview of the configuration file (summarized in a table)
- The requested geospatial domain
- A walkthrough of the cube generation process
Requirements¶
Before proceeding, ensure that all required dependencies are installed.
Create the conda environment using the following command:
conda env create -f environment.yml
and activate that environment to run this notebook using that kernel
conda activate eo-lincs-scs1
Next, accessing ERA5 reanalysis data via the Copernicus Data Store (CDS) requires a valid CDS API key. This can be obtained by following the instructions in the xcube-cds documentation.
Once obtained, the credentials must be added to the configuration file. Because the configuration file is generated programmatically while iterating over the flux tower sites, the credentials should be inserted into the Python code shown below, at the end of Cell 7 in the section labeled # add CDS data store.
Running the Pipeline¶
Once all dependencies are installed, the final data cubes can be generated by executing this notebook.
The results will be stored locally in the data directory, as defined in the configuration YAML file. Note that this is a user-defined choice and can be changed to any file-system-based data store (e.g. "file", "s3"). The only requirement is that a writable data store named storage is provided (see the documentation).
Initial visualizations of the generated cubes are displayed at the end of the notebook.
Imports¶
Let's import everything we need:
import yaml
import pandas as pd
import utm
import pyproj
import matplotlib.pyplot as plt
from xcube_multistore import MultiSourceDataStore
Configuration of the Cube Generation Process¶
The full data cube generation is controlled via a configuration YAML file, which must be prepared in advance. This configuration is highly flexible, allowing users to customize all aspects of the processing workflow within the Multi-Source Data Store, while also serving as a reproducible recipe of the cube generation process.
You can explore the available options for the configuration file using the function get_config_schema(). Run it and expand the fields to see all the properties that can be set, in combination with the Configuration Guide.
For guidance on creating your configuration file, you can also refer to the notebook Setup Config YAML File or consult the Configuration Guide in the documentation.
MultiSourceDataStore.get_config_schema()
<xcube.util.jsonschema.JsonObjectSchema at 0x73b3f6874050>
Since this science case focuses on small spatial cutouts and long time series around flux towers, the configuration for the MultiSourceDataStore is generated programmatically. This enables efficient iteration over individual flux tower sites.
For Sentinel-2 access, a time-series mode has been implemented in xcube-stac. It identifies the tiles corresponding to the closest WGS Sentinel-2 grid and performs spatial sub-setting while preserving the native 10 m resolution. Bands at lower resolutions (20 m and 60 m) are resampled to the 10 m grid. This approach preserves the data in its most native form and minimises processing during cube generation.
This mode is available for both Planetary Computer and CDSE. On Planetary Computer, data is stored as GeoTIFF, allowing efficient chunked access, while on CDSE it is stored as JPEG2000, which requires full tile downloads. As a result, Planetary Computer is roughly 10× faster than CDSE when retrieving small cutouts. Switching between the two data stores only requires updating the store configuration. Replace the respective data store in the configuration YAML:
- identifier: stac-pc
store_id: stac-pc-ardc
with:
- identifier: stac-cdse
store_id: stac-cdse-ardc
store_params:
key: <CDSE S3 key>
secret: <CDSE S3 secret>
The required CDSE S3 credentials can be generated by following the CDSE S3 documentation.
For Sentinel-3 SYNERGY and LST access, the main bottleneck is that the data is stored on an irregular satellite observation grid, so rectification is required to get an anaylsis-ready data cube. This is performed using xcube-resampling. Before rectification, subsetting is necessary, but currently the full latitude-longitude grid must be loaded to determine the region of interest. Efforts are underway to estimate the region without loading the full grid and to subset the dataset before any processing.
However, for LST, significant improvements are unlikely because tiles (1200×1500) are loaded as a single chunk, meaning that any subsetting requires downloading the full observation. Additionally, LST retrieval is slow because terrain correction is not applied in the product; viewing angles and elevation data must be loaded and upsampled, and terrain-induced orthorectification is applied on the fly. All of this makes retrieving Sentinel-3 for long time series and small cutouts infeasibly slow, though improvements are being worked on. For larger regions of interest, performance is reasonable due to the large swath coverage of Sentinel-3, as shown in the xcube-stac examples. For this example here we will include Sentinel-3 SYNERGY to demonstrate the integration into xcube Mutli-Source datastore. Once we improve we the Sentinel-3 access, we plan to include LST data as well.
For ERA5, the newly published ERA5 hourly time series dataset (reanalysis-era5-single-levels-timeseries) is used. It provides point-level access, although only a limited set of variables is available, which is sufficient for this science case. This allows very fast access to ERA5 time series data.
GEDI data is retrieved using xcube-gedidb, which connects to gediDB. The result is a vector dataset* where each LiDAR pulse is stored as a latitude-longitude point, yielding a scattered point dataset.
The ESA CCI dataset is accessed using xcube-cci. Data is loaded lazily and resampled on the fly to match the Sentinel-2 grid. This alignment can be specified in the configuration by setting the grid_mapping field to the Sentinel-2 data ID.
The remaining datasets are accessed through the EO Forest STAC Catalog. All datasets in this catalog are openly available without authentication and are stored in the Zarr format, enabling efficient lazy loading and chunked access. They can also be resampled on the fly to match the spatial resolution and grid of the Sentinel-2 data.
We will now create a configuration file for a flux tower site as an example. The coordinates of the flux tower sites are given in sites.csv. A short time range is selected because Sentinel-3 retrieval is currently slow, although improvements are in progress. Sentinel-2 L2A, in contrast, works well for long time series, as shown in SCS3.
sites = pd.read_csv("sites.csv")
sites = sites.iloc[1:2]
sites
| Site ID | latitude | longitude | IGBP | |
|---|---|---|---|---|
| 1 | AU-How | -12.4943 | 131.1523 | WSA |
time_range = ["2020-04-15", "2020-04-30"]
Here we define a helper function that retrieves the UTM CRS and calculates a bounding box from a latitude-longitude point. This is used to generate inputs for Sentinel-3 based on the Sentinel-2 time series mode.
def get_utm_bbox_crs(
lon: float | int, lat: float | int, bbox_width: int
) -> tuple[str, list[float | int]]:
x, y, zone_number, hemisphere = utm.from_latlon(lat, lon)
epsg_code = 32600 + zone_number if hemisphere == "N" else 32700 + zone_number
crs = f"EPSG:{epsg_code}"
x = int(x)
y = int(y)
half_width = bbox_width / 2
bbox = [x - half_width, y - half_width, x + half_width, y + half_width]
return crs, bbox
Here we define data IDs for the datasets accessed via the EO Forest STAC Catalog. Note that some datasets require custom processing, for which we specify the name of the modifying function stored in utils.py.
foresteo_data_ids = [
('biomass-carbon/CCI_BIOMASS/CCI_BIOMASS_v6.0/CCI_BIOMASS_v6.0.json', "modify_cci_biomass"),
('biomass-carbon/SAATCHI_BIOMASS/SAATCHI_BIOMASS_v2.0/SAATCHI_BIOMASS_v2.0.json', None),
('biomass-carbon/ROBINSON_CR/ROBINSON_CR_v1.0/ROBINSON_CR_v1.0.json', None),
('disturbance-change/HANSEN_GFC/HANSEN_GFC_v1.12/HANSEN_GFC_v1.12.json', None),
('structure-demography/GAMI/GAMI_v3.1/GAMI_v3.1.json', "modify_gami"),
('structure-demography/POTAPOV_HEIGHT/POTAPOV_HEIGHT_v1.0/POTAPOV_HEIGHT_v1.0.json', None),
]
In the next cell, we create the full configuration and then write it to a YAML file for persistent storage.
config = dict(datasets=[])
for index, site in sites.iterrows():
# append config for Sentinel-2
config_ds = dict(
identifier=f"{site['Site ID']}/sen2",
store="stac-pc",
data_id="sentinel-2-l2a",
open_params=dict(
time_range=time_range,
point=[site["longitude"], site["latitude"]],
bbox_width=4000,
spatial_res=10,
asset_names=[
"B01",
"B02",
"B03",
"B04",
"B05",
"B06",
"B07",
"B08",
"B8A",
"B09",
"B11",
"B12",
"SCL",
],
),
)
config["datasets"].append(config_ds)
# for Sentinel-3 get UTM CRS and bbox in UTM
# we take bbox_width equals 5000 for some butter for resampling
utm_crs, utm_bbox = get_utm_bbox_crs(site["longitude"], site["latitude"], 5000)
# append config for Sentinel-3 SYNERGY
config_ds = dict(
identifier=f"{site['Site ID']}/sen3_syn",
store="stac-pc",
data_id="sentinel-3-synergy-syn-l2-netcdf",
open_params=dict(
asset_names=[
"syn-oa04-reflectance",
"syn-oa08-reflectance",
"syn-oa10-reflectance",
"syn-oa11-reflectance",
"syn-oa12-reflectance",
"syn-oa17-reflectance",
],
time_range=time_range,
bbox=utm_bbox,
spatial_res=200,
crs=utm_crs,
add_error_bands=False,
),
grid_mapping=f"{site['Site ID']}/sen2",
)
config["datasets"].append(config_ds)
# append config for ERA5
config_ds = dict(
identifier=f"{site['Site ID']}/era5",
store="cds",
data_id="reanalysis-era5-single-levels-timeseries",
open_params=dict(
variable_names=[
'10m_u_component_of_wind',
'10m_v_component_of_wind',
'2m_dewpoint_temperature',
'2m_temperature',
'skin_temperature',
'surface_pressure',
'surface_solar_radiation_downwards',
'surface_thermal_radiation_downwards'
],
time_range=["1990-01-01", "2025-12-31"],
location=[site["longitude"], site["latitude"]],
),
)
config["datasets"].append(config_ds)
# append config for ESA CCI
config_ds = dict(
identifier=f"{site['Site ID']}/esa_cci_biomass",
store="esa_cci",
grid_mapping=f"{site['Site ID']}/sen2",
data_id="esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-platform.MERGED.5-0.100m",
open_params=dict(
time_range=time_range,
),
)
config["datasets"].append(config_ds)
# append config for forestEO products
for (data_id, func_name) in foresteo_data_ids:
config_ds = dict(
identifier=f"{site['Site ID']}/{data_id.split('/')[1]}",
store="store_foresteo",
data_id=data_id,
grid_mapping=f"{site['Site ID']}/sen2",
)
if func_name is not None:
config_ds["custom_processing"] = {
"module_path": "utils",
"function_name": func_name,
}
config["datasets"].append(config_ds)
# for GEDI get Sentinel-2 bbox in WGS84
utm_crs, utm_bbox = get_utm_bbox_crs(site["longitude"], site["latitude"], 4000)
t = pyproj.Transformer.from_crs(utm_crs, "EPSG:4326", always_xy=True)
wgs8_bbox = t.transform_bounds(*utm_bbox, densify_pts=101)
# append config for GEDI
config_ds = dict(
identifier=f"{site['Site ID']}/gedi",
store="store_gedi",
data_id="all",
open_params=dict(
bbox=list(wgs8_bbox),
time_range=["2019-04-01", "2025-12-31"],
variables= [
'agbd',
'agbd_pi_lower',
'agbd_pi_upper',
'agbd_se',
'agbd_t',
'agbd_t_se',
'algorithmrun_flag',
'beam_type',
'degrade_flag',
'fhd_normal',
'l2a_quality_flag',
'l2b_quality_flag',
'landsat_treecover',
'leaf_off_flag',
'num_detectedmodes',
'omega',
'quality_flag',
'rh',
'sensitivity',
'solar_elevation',
'surface_flag',
'wsci',
'wsci_pi_lower',
'wsci_pi_upper',
'wsci_quality_flag',
'wsci_xy',
'wsci_xy_pi_lower',
'wsci_xy_pi_upper',
'wsci_z',
'wsci_z_pi_lower',
'wsci_z_pi_upper'
]
),
)
config["datasets"].append(config_ds)
# define stores
config["data_stores"] = []
# add storage data store
config_store = dict(
identifier="storage",
store_id="file",
store_params=dict(root="../data"),
)
config["data_stores"].append(config_store)
# add ESA CCI data store
config_store = dict(
identifier="esa_cci",
store_id="cciodp",
)
config["data_stores"].append(config_store)
# add STAC data store
config_store = dict(
identifier="stac-pc",
store_id="stac-pc-ardc",
)
config["data_stores"].append(config_store)
# add CDS data store
config_store = dict(
identifier="cds",
store_id="cds",
store_params=dict(
endpoint_url="https://cds.climate.copernicus.eu/api",
cds_api_key="cds-api-key",
normalize_names=True,
),
)
config["data_stores"].append(config_store)
# add ForestEO STAC data store
config_store = dict(
identifier="store_foresteo",
store_id="stac",
store_params={
"url": "https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json",
}
)
config["data_stores"].append(config_store)
# add GEDI data store
config_store = dict(
identifier="store_gedi",
store_id="gedidb",
)
config["data_stores"].append(config_store)
with open("config.yml", "w") as file:
yaml.dump(config, file, sort_keys=False)
Now, we can initialize the MultiSourceDataStore by passing the path to the config.yml which currently is on the same level as this notebook.
Note: The following cell will produce a
NoConformsTowarning. This occurs because the EO Forest STAC is a static STAC catalog rather than a full STAC API. It does not expose search capabilities or advertise STAC API conformance classes. This warning is harmless, as xcube-stac can also work with static STAC catalogs.
msds = MultiSourceDataStore("config.yml")
/home/konstantin/micromamba/envs/eo-lincs-scs1/lib/python3.13/site-packages/pystac_client/client.py:191: NoConformsTo: Server does not advertise any conformance classes. warnings.warn(NoConformsTo())
And we can display the overview of the configuration file for each dataset.
msds.display_config()
| User-defined ID | Data Store ID | Data Store Params | Data ID | Open Data Params | Grid-Mapping | Format |
|---|---|---|---|---|---|---|
| AU-How/sen2 | stac-pc-ardc | - | sentinel-2-l2a | time_range: ['2020-04-15', '2020-04-30']; point: [131.1523, -12.4943]; bbox_width: 4000; spatial_res: 10; asset_names: ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'SCL'] | - | Zarr |
| AU-How/sen3_syn | stac-pc-ardc | - | sentinel-3-synergy-syn-l2-netcdf | asset_names: ['syn-oa04-reflectance', 'syn-oa08-reflectance', 'syn-oa10-reflectance', 'syn-oa11-reflectance', 'syn-oa12-reflectance', 'syn-oa17-reflectance']; time_range: ['2020-04-15', '2020-04-30']; bbox: [731412.0, 8615335.0, 736412.0, 8620335.0]; spatial_res: 200; crs: EPSG:32752; add_error_bands: False | Like 'AU-How/sen2' | Zarr |
| AU-How/era5 | cds | endpoint_url: https://cds.climate.copernicus.eu/api; cds_api_key: cds-api-key; normalize_names: True | reanalysis-era5-single-levels-timeseries | variable_names: ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature', '2m_temperature', 'skin_temperature', 'surface_pressure', 'surface_solar_radiation_downwards', 'surface_thermal_radiation_downwards']; time_range: ['1990-01-01', '2025-12-31']; location: [131.1523, -12.4943] | - | Zarr |
| AU-How/esa_cci_biomass | cciodp | - | esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-platform.MERGED.5-0.100m | time_range: ['2020-04-15', '2020-04-30'] | Like 'AU-How/sen2' | Zarr |
| AU-How/CCI_BIOMASS | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | biomass-carbon/CCI_BIOMASS/CCI_BIOMASS_v6.0/CCI_BIOMASS_v6.0.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/SAATCHI_BIOMASS | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | biomass-carbon/SAATCHI_BIOMASS/SAATCHI_BIOMASS_v2.0/SAATCHI_BIOMASS_v2.0.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/ROBINSON_CR | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | biomass-carbon/ROBINSON_CR/ROBINSON_CR_v1.0/ROBINSON_CR_v1.0.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/HANSEN_GFC | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | disturbance-change/HANSEN_GFC/HANSEN_GFC_v1.12/HANSEN_GFC_v1.12.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/GAMI | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | structure-demography/GAMI/GAMI_v3.1/GAMI_v3.1.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/POTAPOV_HEIGHT | stac | url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded-data/collections/public/catalog.json | structure-demography/POTAPOV_HEIGHT/POTAPOV_HEIGHT_v1.0/POTAPOV_HEIGHT_v1.0.json | - | Like 'AU-How/sen2' | Zarr |
| AU-How/gedi | gedidb | - | all | bbox: [131.13375765995252, -12.512527494209674, 131.17084382391076, -12.476087024702665]; time_range: ['2019-04-01', '2025-12-31']; variables: ['agbd', 'agbd_pi_lower', 'agbd_pi_upper', 'agbd_se', 'agbd_t', 'agbd_t_se', 'algorithmrun_flag', 'beam_type', 'degrade_flag', 'fhd_normal', 'l2a_quality_flag', 'l2b_quality_flag', 'landsat_treecover', 'leaf_off_flag', 'num_detectedmodes', 'omega', 'quality_flag', 'rh', 'sensitivity', 'solar_elevation', 'surface_flag', 'wsci', 'wsci_pi_lower', 'wsci_pi_upper', 'wsci_quality_flag', 'wsci_xy', 'wsci_xy_pi_lower', 'wsci_xy_pi_upper', 'wsci_z', 'wsci_z_pi_lower', 'wsci_z_pi_upper'] | - | Zarr |
In the following cell, we visualize the areas of interest for all datasets defined in the configuration YAML. Bounding boxes are displayed as GeoJSON polygons, while point locations are represented as GeoJSON points.
Below, the bounding box and point location for a selected site are shown as an example.
msds.display_geolocations()
Cube Generation¶
Cube generation is triggered with a single command, and each dataset defined in the configuration YAML is processed sequentially. The workflow applies the following steps as required:
- Request the dataset
- Perform dataset-specific preprocessing (e.g., spatial and/or temporal resampling)
- Write the resulting data cube using the storage datastore, whose root is defined as
datain the configuration file. Each dataset is written to its corresponding identifier, such as<site_id>/sen2. For each flux tower site, all related datasets are therefore grouped within a common<site_id>/directory, forming a single site-specific collection within the output structure.
Dataset Notes¶
sen2 Generates an analysis-ready datacube by stacking multiple Sentinel-2 Level-2A observations along the time dimension and subsetting them to the defined small spatial cutout. The data remains in the native UTM grid to minimize reprojection errors.
sen3_syn Generates an analysis-ready datacube by rectifying Sentinel-3 SYNERGY products to a user-defined target grid. Quality flag values are included in the output. The target grid is aligned with the Sentinel-2 grid.
sen3_lst Generates an analysis-ready datacube by rectifying Sentinel-3 LST products to a user-defined target grid. Quality flag values are included in the output. The grid is aligned with the Sentinel-2 grid.
era5 Provides time series data at the flux tower location for the selected variables.
esa_cci_biomass Accessed lazily via the xcube ESA CCI datastore, with on-the-fly resampling to match the Sentinel-2 grid.
CCI_BIOMASS, SAATCHI_BIOMASS, ROBINSON_CR, HANSEN_GFC, GAMI, POTAPOV_HEIGHT Each dataset is accessed lazily via the xcube STAC datastore, with on-the-fly resampling to match the Sentinel-2 grid.
gedi Accessed via the xcube gediDB datastore. The bounding box is derived from the Sentinel-2 configuration, retrieving all scattered GEDI data points within that spatial extent.
Note: Most of the logging is related to the Copernicus Data Store (CDS) when retrieving ERA-5 reanalysis data.
msds.generate()
| Dataset identifier | Status | Message | Exception |
|---|---|---|---|
| AU-How/sen2 | COMPLETED | Dataset 'AU-How/sen2' finished: 0:02:26 | - |
| AU-How/sen3_syn | COMPLETED | Dataset 'AU-How/sen3_syn' finished: 0:15:49 | - |
| AU-How/era5 | COMPLETED | Dataset 'AU-How/era5' finished: 0:02:21 | - |
| AU-How/esa_cci_biomass | COMPLETED | Dataset 'AU-How/esa_cci_biomass' finished: 0:00:17 | - |
| AU-How/CCI_BIOMASS | COMPLETED | Dataset 'AU-How/CCI_BIOMASS' finished: 0:00:17 | - |
| AU-How/SAATCHI_BIOMASS | COMPLETED | Dataset 'AU-How/SAATCHI_BIOMASS' finished: 0:00:03 | - |
| AU-How/ROBINSON_CR | COMPLETED | Dataset 'AU-How/ROBINSON_CR' finished: 0:00:04 | - |
| AU-How/HANSEN_GFC | COMPLETED | Dataset 'AU-How/HANSEN_GFC' finished: 0:00:24 | - |
| AU-How/GAMI | COMPLETED | Dataset 'AU-How/GAMI' finished: 0:01:11 | - |
| AU-How/POTAPOV_HEIGHT | COMPLETED | Dataset 'AU-How/POTAPOV_HEIGHT' finished: 0:00:16 | - |
| AU-How/gedi | COMPLETED | Dataset 'AU-How/gedi' finished: 0:03:56 | - |
/home/konstantin/micromamba/envs/eo-lincs-scs1/lib/python3.13/site-packages/xcube/core/store/fs/impl/dataset.py:200: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs data.to_zarr( /home/konstantin/micromamba/envs/eo-lincs-scs1/lib/python3.13/site-packages/dask/_task_spec.py:768: RuntimeWarning: invalid value encountered in divide return self.func(*new_argspec) xcube-cds version 1.4.1 2026-06-16 14:32:01,321 - WARNING - Recovering from HTTP error [502 Bad Gateway], attempt 1 of 500 2026-06-16 14:32:01,322 - WARNING - Retrying in 120 seconds 2026-06-16 14:34:01,322 - INFO - Retrying now... 2026-06-16 14:34:01,564 INFO [2026-02-16T00:00:00] - To generate this ERA5 hourly time series dataset, **homogenisation conventions have been applied to the ERA5 source GRIB data** to ensure consistency, usability, and alignment across chosen variables and time steps. The processed data were then written to an **ARCO Zarr archive**, enabling efficient cloud-optimised access and scalable data retrieval. Please refer to the [user guide](https://confluence.ecmwf.int/x/R6cfHg) for details. - The dataset presented here is a subset of selected parameters from the full [CDS ERA5 hourly data on single levels (1940–present)](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview). **Requirements for additional parameters may be considered**. Please raise your request with ECMWF Support [here](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202). 2026-06-16 14:34:01,564 - INFO - [2026-02-16T00:00:00] - To generate this ERA5 hourly time series dataset, **homogenisation conventions have been applied to the ERA5 source GRIB data** to ensure consistency, usability, and alignment across chosen variables and time steps. The processed data were then written to an **ARCO Zarr archive**, enabling efficient cloud-optimised access and scalable data retrieval. Please refer to the [user guide](https://confluence.ecmwf.int/x/R6cfHg) for details. - The dataset presented here is a subset of selected parameters from the full [CDS ERA5 hourly data on single levels (1940–present)](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview). **Requirements for additional parameters may be considered**. Please raise your request with ECMWF Support [here](https://jira.ecmwf.int/plugins/servlet/desk/portal/1/create/202). 2026-06-16 14:34:01,565 INFO Request ID is d22f75cc-0189-4cf5-ac88-a88c15c7baba 2026-06-16 14:34:01,565 - INFO - Request ID is d22f75cc-0189-4cf5-ac88-a88c15c7baba 2026-06-16 14:34:05,361 INFO status has been updated to accepted 2026-06-16 14:34:05,361 - INFO - status has been updated to accepted 2026-06-16 14:34:18,910 INFO status has been updated to successful 2026-06-16 14:34:18,910 - INFO - status has been updated to successful 2026-06-16 14:34:19,152 - INFO - Downloading https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-cache-2/2026-06-11/1c47ae9478da59b32b43e9229200e10c.zip
1c47ae9478da59b32b43e9229200e10c.zip: 0%| | 0.00/12.1M [00:00<?, ?B/s]
Initial Inspection of the Generated Datasets¶
After cube generation has completed, the datasets can be opened using the xcube datastore framework API as usual.
Note that the multi-source datastore expects a datastore named storage. This datastore is defined in the data_stores section of the config.yml configuration file.
ds = msds.stores.storage.open_data("AU-How/sen2.zarr")
ds
<xarray.Dataset> Size: 27MB
Dimensions: (time: 3, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 24B 2020-04-19T01:37:11.024000 ... 202...
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables: (12/13)
B01 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B02 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B03 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B04 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B05 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B06 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
... ...
B08 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B09 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B11 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B12 (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B8A (time, y, x) float32 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SCL (time, y, x) float64 4MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://planetarycomputer.microsoft.com/api/stac/v1
stac_item_id: S2A_MSIL2A_20200419T013711_R031_T52LGM_20200924T025946
stac_item_ids: {'2020-04-19T01:37:11.024000': ['S2A_MSIL2A_20200419...
xcube_stac_version: 1.3.1We can now select a variable for one timestep and plot it for a quick preview of the data.
ds.B04.isel(time=0).plot(vmin=0., vmax=0.1)
plt.tight_layout()
The same quicklook can be created for any generated dataset.
ds = msds.stores.storage.open_data("AU-How/sen3_syn.zarr")
ds
<xarray.Dataset> Size: 194MB
Dimensions: (time: 15, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 120B 2020-04-15T01:01:56 ... 2020-04-3...
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables:
CLOUD_flags (time, y, x) uint8 2MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
OLC_flags (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa04 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa08 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa10 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa11 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa12 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SDR_Oa17 (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SLN_flags (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SLO_flags (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SYN_flags (time, y, x) float64 19MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
comment:
contact: eosupport@copernicus.esa.int
institution: LN2
netCDF_version: 4.2 of Mar 13 2018 10:14:33 $
references: S3IPF PDS 006 - i1r11 - Product Data Format Specific...
resolution: [ 300 300 ]
source: IPF-SY-2 06.19
stac_catalog_url: https://planetarycomputer.microsoft.com/api/stac/v1
title: SYN L2, surface directional reflectance associated w...
xcube_stac_version: 1.3.1fig, ax = plt.subplots(1, 2, figsize=(14, 6))
ds.SDR_Oa04.isel(time=4).plot(ax=ax[0], vmax=0.1, vmin=0.)
ds.SYN_flags.isel(time=4).plot(ax=ax[1])
plt.tight_layout()
ds = msds.stores.storage.open_data("AU-How/era5.zarr")
ds
<xarray.Dataset> Size: 13MB
Dimensions: (time: 315576)
Coordinates:
* time (time) datetime64[ns] 3MB 1990-01-01 ... 2025-12-31T23:00:00
latitude float64 8B ...
longitude float64 8B ...
Data variables:
d2m (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
skt (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
sp (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
ssrd (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
strd (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
t2m (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
u10 (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
v10 (time) float32 1MB dask.array<chunksize=(315576,), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_edition: 1
GRIB_subCentre: 0
history: 2024-09-02T04:48 GRIB to CDM+CF via cfgrib-0.9.1...
institution: European Centre for Medium-Range Weather Forecastsds.t2m.plot()
[<matplotlib.lines.Line2D at 0x72ddd43b5f90>]
ds = msds.stores.storage.open_data("AU-How/esa_cci_biomass.zarr")
ds
<xarray.Dataset> Size: 1MB
Dimensions: (time: 1, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 8B 2020-07-01T23:59:59
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables:
agb (time, y, x) float32 640kB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
agb_sd (time, y, x) float32 640kB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
date_created: 2026-06-16T14:34:24.654447
history: [{'cube_params': {'time_range': ['2020-04-15T00:...
processing_level: L4
time_coverage_duration: P365DT23H59M59S
time_coverage_end: 2020-12-31T23:59:59
time_coverage_start: 2020-01-01T00:00:00
title: esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-plat...ds["agb"].plot()
<matplotlib.collections.QuadMesh at 0x72df54a49090>
ds = msds.stores.storage.open_data("AU-How/CCI_BIOMASS.zarr")
ds
<xarray.Dataset> Size: 26MB
Dimensions: (time: 10, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 80B 2007-01-01 ... 2022-01-01
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05
spatial_ref int64 8B ...
Data variables:
aboveground_biomass (time, y, x) float64 13MB dask.array<chunksize=(5, 400, 400), meta=np.ndarray>
aboveground_biomass_std (time, y, x) float64 13MB dask.array<chunksize=(5, 400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: CCI_BIOMASS_v6.0
xcube_stac_version: 1.3.1ds["aboveground_biomass"].isel(time=-1).plot()
<matplotlib.collections.QuadMesh at 0x72df548a3750>
ds = msds.stores.storage.open_data("AU-How/SAATCHI_BIOMASS.zarr")
ds
<xarray.Dataset> Size: 1MB
Dimensions: (y: 400, x: 400)
Coordinates:
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
time datetime64[ns] 8B ...
Data variables:
agb (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: SAATCHI_BIOMASS_v2.0
xcube_stac_version: 1.3.1ds["agb"].plot()
<matplotlib.collections.QuadMesh at 0x72df546465d0>
ds = msds.stores.storage.open_data("AU-How/ROBINSON_CR.zarr")
ds
<xarray.Dataset> Size: 12MB
Dimensions: (y: 400, x: 400)
Coordinates:
* y (y) float64 3kB 8.62e+06 ... 8.616e+06
* x (x) float64 3kB 7.319e+05 ... 7.359e+05
spatial_ref int64 8B ...
Data variables:
age_at_max_rate (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_a (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_a_error (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_b (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_b_error (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_k (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
cr_k_error (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
max_rate (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
max_removal_potential_benefit_25 (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: ROBINSON_CR_v1.0
xcube_stac_version: 1.3.1ds["cr_a"].plot()
<matplotlib.collections.QuadMesh at 0x72dee4551bd0>
ds = msds.stores.storage.open_data("AU-How/HANSEN_GFC.zarr")
ds
<xarray.Dataset> Size: 5MB
Dimensions: (y: 400, x: 400)
Coordinates:
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables:
data_mask (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
gain (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
loss_year (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
tree_cover (y, x) float64 1MB dask.array<chunksize=(400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: HANSEN_GFC_v1.12
xcube_stac_version: 1.3.1ds["tree_cover"].plot()
<matplotlib.collections.QuadMesh at 0x72df54d70a50>
ds = msds.stores.storage.open_data("AU-How/GAMI.zarr")
ds
<xarray.Dataset> Size: 1MB
Dimensions: (time: 2, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 16B 2010-01-01 2020-01-01
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables:
forest_age (time, y, x) float32 1MB dask.array<chunksize=(2, 400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: GAMI_v3.1
xcube_stac_version: 1.3.1ds["forest_age"].isel(time=0).plot()
<matplotlib.collections.QuadMesh at 0x72df046e0050>
ds = msds.stores.storage.open_data("AU-How/POTAPOV_HEIGHT.zarr")
ds
<xarray.Dataset> Size: 6MB
Dimensions: (time: 5, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 40B 2000-01-01 ... 2020-01-01
* y (y) float64 3kB 8.62e+06 8.62e+06 ... 8.616e+06 8.616e+06
* x (x) float64 3kB 7.319e+05 7.319e+05 ... 7.359e+05 7.359e+05
spatial_ref int64 8B ...
Data variables:
canopy_height (time, y, x) float64 6MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://s3.gfz-potsdam.de/dog.atlaseo-glm.eo-gridded...
stac_item_id: POTAPOV_HEIGHT_v1.0
xcube_stac_version: 1.3.1ds["canopy_height"].isel(time=0).plot()
<matplotlib.collections.QuadMesh at 0x72ddd421f110>
ds = msds.stores.storage.open_data("AU-How/gedi.zarr")
ds
<xarray.Dataset> Size: 434kB
Dimensions: (shot_number: 814, profile_points: 101)
Coordinates:
* shot_number (shot_number) uint64 7kB 221080800100138599 ... 553708...
latitude (shot_number) float64 7kB dask.array<chunksize=(814,), meta=np.ndarray>
longitude (shot_number) float64 7kB dask.array<chunksize=(814,), meta=np.ndarray>
time (shot_number) datetime64[ns] 7kB dask.array<chunksize=(814,), meta=np.ndarray>
* profile_points (profile_points) int16 202B 0 1 2 3 4 ... 96 97 98 99 100
Data variables: (12/31)
agbd (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
agbd_pi_lower (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
agbd_pi_upper (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
agbd_se (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
agbd_t (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
agbd_t_se (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
... ...
wsci_xy (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
wsci_xy_pi_lower (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
wsci_xy_pi_upper (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
wsci_z (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
wsci_z_pi_lower (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>
wsci_z_pi_upper (shot_number) float32 3kB dask.array<chunksize=(814,), meta=np.ndarray>plt.figure(figsize=(8, 5))
sc = plt.scatter(
ds["longitude"],
ds["latitude"],
c=ds["agbd"],
cmap="viridis",
s=10,
)
plt.colorbar(sc, label=ds["agbd"].attrs.get("long_name", "agbd"))
plt.xlabel(ds["longitude"].attrs.get("long_name", "Longitude"))
plt.ylabel(ds["latitude"].attrs.get("long_name", "Latitude"))
plt.title("Spatial Plot of agbd")
plt.grid(True)
plt.show()
The sparse GEDI dataset is used in the scientific analysis by aggregating all data points (shots) located within a specified radius around the flux tower. The methodology for translating this sparse satellite data into eddy covariance–comparable estimates is still under development and will be further advanced within the NextGenCarbon project. As a preliminary step, an example analysis is provided in a scientific notebook, illustrating the current approach. An overview of the method’s development status is also presented in Deliverable 5.4.