EO-LINCS project: Cube generation for Scientific Case Study (SCS) 1¶
Explanatory power of novel EO data streams for predicting net carbon fluxes¶
Objective: The SCS1 trains an artificial neural network (ANN) to predict carbon fluxes, where meteorological and reflectance data from satellites are taken as input. The training will be based on the in-situ observations from eddy covariance flux tower provided by the FLUXNET2015 dataset.
Outcomes: A working data processing chain to incorporate Sentinel-2 data into the FLUXCOM-X framework that is updatable and expandable to all sites and other Sentinel data products. An analysis of the contributions of Sentinel-2 data for predicting NEE and analysis into the added value with regards to interannual variability, drought responses, and disturbance.
Required datasets:
The following notebook shows how the users can load data from various sources defined in scs1_config.yml using the MultiSourceDataStore tool.
What You Can Do with This Notebook¶
- Generate a configuration file based on fluxtower locations
- Load datasets from various sources as defined in the generated
scs1_config.yml - View the progress of each data request to the
MultiSourceDataStore - Quickly preview the datasets by plotting them.
Requirements¶
Before you begin, follow these steps:
- Install
xcube-multistorevia conda-forge by running:conda install --channel conda-forge xcube-multistore - To access EO data via S3 from CDSE, generate your S3 credentials and add them to the
data_storessection in thescs1_config.ymlfile. - To access ERA5-Land data from the Copernicus Climate Data Store, obtain a CDS Personal Access Token by creating an account on the CDS Website. After logging in, navigate to your user page to find your Personal Access Token. Add this token to the
data_storessection in thescs1_config.ymlfile.
Once you have it installed, you are ready to proceed.
This Multistore mainly works with a file called scs1_config.yml which is at the same file level as this notebook.
To understand what goes into the schema, you can read more here.
Let's import the MultiSourceDataStore
import yaml
import pandas as pd
from xcube_multistore import MultiSourceDataStore
You can find out how to fill out the config file by also using this super helpful function get_config_schema(). Run it and try expand the fields to learn more about the possible properties that the configuration file accepts along with the Configuration Guide.
MultiSourceDataStore.get_config_schema()
<xcube.util.jsonschema.JsonObjectSchema at 0x7314671ee270>
This science case requires data from several flux sites. So we define them in a scs1_sites.csv file for easier management and access. For the purpose of this example, we will focus on the first 2 sites.
sites = pd.read_csv("scs1_sites.csv")
sites = sites.iloc[:2]
sites
| Site ID | latitude | longitude | IGBP | |
|---|---|---|---|---|
| 0 | AU-Dry | -15.2588 | 132.3706 | SAV |
| 1 | AU-How | -12.4943 | 131.1523 | WSA |
In the following cell, we will create the config object which will then be saved as scs1_config.yml for persistance and ready to be read by MultiSourceDataStore.
To read more about how this config file is structured, you can find the Configuration Guide here, or refer to the notebook setup_config.ipynb.
Specifically, we are using the single dataset object and data stores schemas here.
time_range = ["2020-01-01", "2020-03-30"]
config = dict(datasets=[])
for index, site in sites.iterrows():
# append config for Sentinel-2
config_ds = dict(
identifier=f"{site['Site ID']}_sen2",
store="stac-cdse",
data_id="sentinel-2-l2a",
open_params=dict(
time_range=time_range,
point=[site["longitude"], site["latitude"]],
bbox_width=4000,
spatial_res=10,
asset_names=[
"B01",
"B02",
"B03",
"B04",
"B05",
"B06",
"B07",
"B08",
"B8A",
"B09",
"B11",
"B12",
"SCL",
],
),
)
config["datasets"].append(config_ds)
# append config for ERA5
config_ds = dict(
identifier=f"{site['Site ID']}_era5land",
store="cds",
data_id="reanalysis-era5-land",
open_params=dict(
variable_names=["2m_temperature", "total_precipitation"],
time_range=time_range,
point=[site["longitude"], site["latitude"]],
spatial_res=0.1,
),
)
config["datasets"].append(config_ds)
# append config for ESA CCI
config_ds = dict(
identifier=f"{site['Site ID']}_ccibiomass",
store="esa_cci",
grid_mapping=f"{site['Site ID']}_sen2",
data_id="esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-platform.MERGED.5-0.100m",
open_params=dict(
time_range=time_range,
),
)
config["datasets"].append(config_ds)
# define stores
config["data_stores"] = []
# add storage data store
config_store = dict(
identifier="storage",
store_id="file",
store_params=dict(root="data"),
)
config["data_stores"].append(config_store)
# add ESA CCI data store
config_store = dict(
identifier="esa_cci",
store_id="cciodp",
)
config["data_stores"].append(config_store)
# add STAC data store
config_store = dict(
identifier="stac-cdse",
store_id="stac-cdse-ardc",
store_params=dict(
key="<CDSE_S3_key>",
secret="<CDSE_S3_secret>",
),
)
config["data_stores"].append(config_store)
# add CDS data store
config_store = dict(
identifier="cds",
store_id="cds",
store_params=dict(
endpoint_url="https://cds.climate.copernicus.eu/api",
cds_api_key="CDS_API_key",
normalize_names=True,
),
)
config["data_stores"].append(config_store)
with open("scs1_config.yml", "w") as file:
yaml.dump(config, file, sort_keys=False)
Now, we can initialize the MultiSourceDataStore by passing the path to the scs1_config.yml which currently is on the same level as this notebook.
msds = MultiSourceDataStore("scs1_config.yml")
And we can display the overview of the configuration file for each dataset.
msds.display_config()
| User-defined ID | Data Store ID | Data Store Params | Data ID | Open Data Params | Grid-Mapping | Format |
|---|---|---|---|---|---|---|
| AU-Dry_sen2 | stac-cdse-ardc | key: <CDSE_S3_key>; secret: <CDSE_S3_secret> | sentinel-2-l2a | time_range: ['2020-01-01', '2020-03-30']; point: [132.3706, -15.2588]; bbox_width: 4000; spatial_res: 10; asset_names: ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'SCL'] | - | Zarr |
| AU-Dry_era5land | cds | endpoint_url: https://cds.climate.copernicus.eu/api; cds_api_key: CDS_API_key; normalize_names: True | reanalysis-era5-land | variable_names: ['2m_temperature', 'total_precipitation']; time_range: ['2020-01-01', '2020-03-30']; point: [132.3706, -15.2588]; spatial_res: 0.1 | - | Zarr |
| AU-Dry_ccibiomass | cciodp | - | esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-platform.MERGED.5-0.100m | time_range: ['2020-01-01', '2020-03-30'] | Like 'AU-Dry_sen2' | Zarr |
| AU-How_sen2 | stac-cdse-ardc | key: <CDSE_S3_key>; secret: <CDSE_S3_secret> | sentinel-2-l2a | time_range: ['2020-01-01', '2020-03-30']; point: [131.1523, -12.4943]; bbox_width: 4000; spatial_res: 10; asset_names: ['B01', 'B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12', 'SCL'] | - | Zarr |
| AU-How_era5land | cds | endpoint_url: https://cds.climate.copernicus.eu/api; cds_api_key: CDS_API_key; normalize_names: True | reanalysis-era5-land | variable_names: ['2m_temperature', 'total_precipitation']; time_range: ['2020-01-01', '2020-03-30']; point: [131.1523, -12.4943]; spatial_res: 0.1 | - | Zarr |
| AU-How_ccibiomass | cciodp | - | esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-platform.MERGED.5-0.100m | time_range: ['2020-01-01', '2020-03-30'] | Like 'AU-How_sen2' | Zarr |
We can display the selected bounding box as shown in the following cell.
msds.display_geolocations()
We can now generate the datacubes:
msds.generate()
| Dataset identifier | Status | Message | Exception |
|---|---|---|---|
| AU-Dry_sen2 | COMPLETED | Dataset 'AU-Dry_sen2' finished: 0:31:22 | - |
| AU-Dry_era5land | COMPLETED | Dataset 'AU-Dry_era5land' finished: 0:22:45 | - |
| AU-Dry_ccibiomass | COMPLETED | Dataset 'AU-Dry_ccibiomass' finished: 0:00:35 | - |
| AU-How_sen2 | COMPLETED | Dataset 'AU-How_sen2' finished: 0:29:30 | - |
| AU-How_era5land | COMPLETED | Dataset 'AU-How_era5land' finished: 0:16:44 | - |
| AU-How_ccibiomass | COMPLETED | Dataset 'AU-How_ccibiomass' finished: 0:00:36 | - |
xcube-cds version 1.1.0.dev0
2025-12-16 14:22:28,942 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you.
xcube-cds version 1.1.0.dev0
2025-12-16 14:22:29,180 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you.
2025-12-16 14:22:29,373 INFO [2025-12-11T00:00:00] Please note that a dedicated catalogue entry for this dataset, post-processed and stored in Analysis Ready Cloud Optimized (ARCO) format (Zarr), is available for optimised time-series retrievals (i.e. for retrieving data from selected variables for a single point over an extended period of time in an efficient way). You can discover it [here](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-timeseries?tab=overview)
2025-12-16 14:22:29,374 INFO Request ID is e8fea833-88e9-45a0-9702-c202115a72a8
2025-12-16 14:22:29,440 INFO status has been updated to accepted
2025-12-16 14:22:50,751 INFO status has been updated to running
2025-12-16 14:25:21,440 INFO status has been updated to accepted
2025-12-16 14:28:48,411 INFO status has been updated to running
Recovering from connection error [('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))], attempt 1 of 500
Retrying in 120 seconds
2025-12-16 14:36:51,631 INFO status has been updated to successful
51313d4fc8684df65e1bfd4feb96618.zip: 0%| | 0.00/226k [00:00<?, ?B/s]
xcube-cds version 1.1.0.dev0 2025-12-16 14:36:53,151 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you. 2025-12-16 14:36:53,336 INFO [2025-12-11T00:00:00] Please note that a dedicated catalogue entry for this dataset, post-processed and stored in Analysis Ready Cloud Optimized (ARCO) format (Zarr), is available for optimised time-series retrievals (i.e. for retrieving data from selected variables for a single point over an extended period of time in an efficient way). You can discover it [here](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-timeseries?tab=overview) 2025-12-16 14:36:53,337 INFO Request ID is 4a87fa3c-0d21-4371-9429-9ea44b5cfcbf 2025-12-16 14:36:53,406 INFO status has been updated to accepted 2025-12-16 14:37:26,699 INFO status has been updated to running 2025-12-16 14:45:13,179 INFO status has been updated to successful
c851b1f6d1d80a20d6730cf4359d1942.zip: 0%| | 0.00/211k [00:00<?, ?B/s]
xcube-cds version 1.1.0.dev0 2025-12-16 15:15:22,289 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you. xcube-cds version 1.1.0.dev0 2025-12-16 15:15:22,553 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you. 2025-12-16 15:15:22,760 INFO [2025-12-11T00:00:00] Please note that a dedicated catalogue entry for this dataset, post-processed and stored in Analysis Ready Cloud Optimized (ARCO) format (Zarr), is available for optimised time-series retrievals (i.e. for retrieving data from selected variables for a single point over an extended period of time in an efficient way). You can discover it [here](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-timeseries?tab=overview) 2025-12-16 15:15:22,761 INFO Request ID is 1facf12d-28cc-4bff-834b-4da6fc2395f2 2025-12-16 15:15:22,830 INFO status has been updated to accepted 2025-12-16 15:16:12,693 INFO status has been updated to running 2025-12-16 15:23:42,021 INFO status has been updated to successful
e8c4f05e569f96ddc795e77ef5c61fc8.zip: 0%| | 0.00/235k [00:00<?, ?B/s]
xcube-cds version 1.1.0.dev0 2025-12-16 15:23:42,926 INFO [2025-12-03T00:00:00Z] To improve our C3S service, we need to hear from you! Please complete this very short [survey](https://confluence.ecmwf.int/x/E7uBEQ/). Thank you. 2025-12-16 15:23:43,103 INFO [2025-12-11T00:00:00] Please note that a dedicated catalogue entry for this dataset, post-processed and stored in Analysis Ready Cloud Optimized (ARCO) format (Zarr), is available for optimised time-series retrievals (i.e. for retrieving data from selected variables for a single point over an extended period of time in an efficient way). You can discover it [here](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-timeseries?tab=overview) 2025-12-16 15:23:43,104 INFO Request ID is cb378b7a-84ff-422c-bf62-c10313dcfbd3 2025-12-16 15:23:43,159 INFO status has been updated to accepted 2025-12-16 15:23:51,599 INFO status has been updated to running 2025-12-16 15:32:04,834 INFO status has been updated to successful
14d54f70d512f1331ca8fdefc49ae26a.zip: 0%| | 0.00/231k [00:00<?, ?B/s]
We can now open the data using the xcube datastore framework API as usual. Note that the multi-source data store requires a data store called storage, which is configured in our scs1_config.yml under the data_stores section.
ds = msds.stores.storage.open_data("AU-Dry_sen2.zarr")
ds
<xarray.Dataset> Size: 150MB
Dimensions: (time: 18, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 144B 2020-01-02T01:27:09.024000 ... 20...
* y (y) float64 3kB 8.312e+06 8.312e+06 ... 8.308e+06 8.308e+06
* x (x) float64 3kB 8.601e+05 8.601e+05 ... 8.641e+05 8.641e+05
spatial_ref int64 8B ...
Data variables: (12/13)
B01 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B02 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B03 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B04 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B05 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B06 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
... ...
B08 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B09 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B11 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B12 (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
B8A (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
SCL (time, y, x) float32 12MB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
stac_catalog_url: https://stac.dataspace.copernicus.eu/v1
stac_item_id: S2B_MSIL2A_20200102T012709_N0500_R131_T52LHJ_2023042...
stac_item_ids: {'2020-01-02T01:27:09.024000': ['S2B_MSIL2A_20200102...
xcube_stac_version: 1.1.2We can now select a variable for one timestep and plot it for a quick preview of the data
ds.B04.isel(time=0).plot(vmin=0., vmax=0.2)
<matplotlib.collections.QuadMesh at 0x7066963fd2b0>
ds = msds.stores.storage.open_data("AU-Dry_era5land.zarr")
ds
<xarray.Dataset> Size: 69kB
Dimensions: (time: 2160)
Coordinates:
* time (time) datetime64[ns] 17kB 2020-01-01 ... 2020-03-30T23:00:00
expver (time) <U4 35kB dask.array<chunksize=(1080,), meta=np.ndarray>
lat float64 8B ...
lon float64 8B ...
number int64 8B ...
Data variables:
t2m (time) float32 9kB dask.array<chunksize=(1080,), meta=np.ndarray>
tp (time) float32 9kB dask.array<chunksize=(1080,), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
history: 2025-12-16T13:33 GRIB to CDM+CF via cfgrib-0.9.1...
institution: European Centre for Medium-Range Weather Forecastsds.t2m.plot()
[<matplotlib.lines.Line2D at 0x70669d62a5d0>]
ds = msds.stores.storage.open_data("AU-Dry_ccibiomass.zarr")
ds
<xarray.Dataset> Size: 1MB
Dimensions: (time: 1, y: 400, x: 400)
Coordinates:
* time (time) datetime64[ns] 8B 2020-07-01T23:59:59
* y (y) float64 3kB 8.312e+06 8.312e+06 ... 8.308e+06 8.308e+06
* x (x) float64 3kB 8.601e+05 8.601e+05 ... 8.641e+05 8.641e+05
spatial_ref int64 8B ...
Data variables:
agb (time, y, x) float32 640kB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
agb_sd (time, y, x) float32 640kB dask.array<chunksize=(1, 400, 400), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
date_created: 2025-12-16T14:45:28.713382
history: [{'cube_params': {'time_range': ['2020-01-01T00:...
processing_level: L4
time_coverage_duration: P365DT23H59M59S
time_coverage_end: 2020-12-31T23:59:59
time_coverage_start: 2020-01-01T00:00:00
title: esacci.BIOMASS.yr.L4.AGB.multi-sensor.multi-plat...ds["agb"].plot()
<matplotlib.collections.QuadMesh at 0x70669d59d450>