EO-LINCS project: Cube generation for Scientific Case Study (SCS) 2¶
Forest recovery post disturbance¶
Objective: The SCS2 aims to estimate canopy height and above-ground biomass maps using deep learning methods based on remote sensing data. The canopy height and above-ground biomass maps are subsequently used to calculate recovery curves, which in turn can be used to estimate carbon budget.
Outcomes: New high-resolution height/biomass maps that are expected to enable the monitoring of biomass at finer scales, in particular the impact of fine scale forest disturbances due to management practices such as thinning and the impact of natural disturbances (insects attacks, droughts, fires and windthrown in regions of interest). Analysis of forest recovery depending on environmental factors (such as climate, soil composition and pH) and the nature and intensity of disturbance shall aid in the optimization of forest management considering potential increased future disturbances.
Required Datasets:
- Global Age Mapping Integration (GAMI) dataset
- European Forest Disturbance Atlas
- Canopy height and biomass map for Europe
- Copernicus Tree cover density 2018
- Copernicus Dominant Leaf Type 2018
- Copernicus Forest Type 2018
The following notebook shows how the users can load data from various sources defined in scs2_config.yml
using the MultiSourceDataStore
tool.
What You Can Do with This Notebook¶
- Load datasets from various sources as defined in the
scs2_config.yml
- View the progress of each data request to the
MultiSourceDataStore
- Quickly preview the datasets by downsampling and plotting them.
Requirements¶
Before you begin, ensure that all necessary dependencies are installed and that you have generated API token credentials for CLMS:
- Install
xcube-multistore
via conda-forge by running:conda install --channel conda-forge xcube-multistore
- Generate CLMS API token credentials by following the offical instructions. Save the credentials in a file named
clms-credentials.json
, placing it in the same directory as this notebook. Note that the filename can be customized in thescs2_config.yml
configuration file.
Once you have it installed, you are ready to proceed.
This Multistore mainly works with a file called scs2_config.yml
which is at the same file level as this notebook.
To understand what goes into the schema, you can read more here.
Let's import the MultiSourceDataStore
from xcube_multistore.multistore import MultiSourceDataStore
You can find out how to fill out the config file by also using this super helpful function get_config_schema()
. Run it and try expand the fields to learn more about the possible properties that the configuration file accepts along with the Configuration Guide.
MultiSourceDataStore.get_config_schema()
<xcube.util.jsonschema.JsonObjectSchema at 0x7234bc3bdb50>
Now, we can initialize the MultiSourceDataStore
by passing the path to the scs2_config.yml
which currently is on the same level as this notebook.
By running the cell below, you would start seeing progress tables for each data that you requested in the scs2_config.yml
.
NOTE: In the scs2_config.yml
we are also using the custom_processing
feature of this tool that allows us to run a function for processing each dataset separately. In this example, we have defined a module called modify_dataset
that does some custom processing which takes a xarray.Dataset
argument as input and returns a new xarray.Dataset
object. To read more about this custom_processing
function, you can see more here.
msds = MultiSourceDataStore("scs2_config.yml")
Dataset identifier | Status | Message | Exception |
---|---|---|---|
france.zip | STOPPED | Already preloaded. | - |
Dataset identifier | Status | Message | Exception |
---|---|---|---|
tree-cover-density-2018|TCD_2018_010m_fr_03035_v020 | STOPPED | Already preloaded. | - |
dominant-leaf-type-2018|DLT_2018_010m_fr_03035_v020 | STOPPED | Already preloaded. | - |
forest-type-2018|FTY_2018_010m_fr_03035_v010 | STOPPED | Already preloaded. | - |
Dataset identifier | Status | Message | Exception |
---|---|---|---|
clms | STOPPED | Dataset 'clms' finished. | - |
senf | STOPPED | Dataset 'senf' already generated. | - |
gami | STOPPED | Dataset 'gami' finished. | - |
liu | STOPPED | Dataset 'liu' already generated. | - |
We can now open the data using the xcube datastore framework API as usual. Note that the multi-source data store requires a data store called storage
, which is configured in our scs2_config.yml
under the data_stores
section.
ds = msds.stores.storage.open_data("gami.zarr")
ds
<xarray.Dataset> Size: 3GB Dimensions: (members: 20, time: 2, latitude: 5199, longitude: 7000) Coordinates: * latitude (latitude) float64 42kB 44.8 44.8 44.8 44.8 ... 43.5 43.5 43.5 * longitude (longitude) float64 56kB -1.25 -1.25 -1.249 ... 0.4996 0.4999 * members (members) int64 160B 0 1 2 3 4 5 6 7 ... 13 14 15 16 17 18 19 spatial_ref int64 8B ... * time (time) datetime64[ns] 16B 2010-01-01 2020-01-01 Data variables: forest_age (members, time, latitude, longitude) int16 3GB dask.array<chunksize=(1, 1, 1000, 1000), meta=np.ndarray> Attributes: (12/21) _FillValue: -9999 contact: Simon Besnard (besnard@gfz.de) and Nuno Carva... created_by: Simon Besnard creation_date: 2025-01-31 16:52 date_modified: 2025-03-25T07:07:14.530681 frequency: 2010 and 2020 ... ... geospatial_lon_units: degrees_east institute_id: GFZ-Potsdam and MPI-BGC institution: Helmholtz Center Potsdam GFZ German Research ... product_name: Global Age Mapping Integration (GAMI) v2.1 references: https://doi.org/10.5880/GFZ.1.4.2023.006 and ... version: v2.1
We can now select a variable for one timestep, downsample and plot it for a quick preview of the data
ds["forest_age"].isel(members=0, time=0)[::5, ::5].plot(vmin=0, vmax=120)
<matplotlib.collections.QuadMesh at 0x7234b1271fd0>
ds["forest_age"].isel(members=0, time=1)[::5, ::5].plot(vmin=0, vmax=120)
<matplotlib.collections.QuadMesh at 0x7234b06dfa10>
ds = msds.stores.storage.open_data("clms.zarr")
ds
<xarray.Dataset> Size: 874MB Dimensions: (lat: 5199, lon: 7000) Coordinates: * lat (lat) float64 42kB 44.8 44.8 44.8 ... 43.5 43.5 * lon (lon) float64 56kB -1.25 -1.25 ... 0.4996 0.4999 Data variables: dominant_leaf_type_2018 (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> forest_type_2018 (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> tree_cover_density_2018 (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> Attributes: (12/13) AREA_OR_POINT: Area DataType: Thematic date_modified: 2025-03-25T07:00:35.818524 geospatial_bounds: POLYGON((-1.2164245320441032 43.3253275548571... geospatial_bounds_crs: CRS84 geospatial_lat_max: 44.97163405364227 ... ... geospatial_lat_resolution: 0.00010168745379246502 geospatial_lat_units: degrees_north geospatial_lon_max: 0.4708656240470786 geospatial_lon_min: -1.2164245320441032 geospatial_lon_resolution: 0.00010748994246473353 geospatial_lon_units: degrees_east
ds["dominant_leaf_type_2018"][::5, ::5].plot(vmax=3)
<matplotlib.collections.QuadMesh at 0x7234b06a88c0>
ds["forest_type_2018"][::5, ::5].plot(vmax=3)
<matplotlib.collections.QuadMesh at 0x7234b1d4b320>
ds["tree_cover_density_2018"][::5, ::5].plot(vmax=100)
<matplotlib.collections.QuadMesh at 0x7234b04ec710>
ds = msds.stores.storage.open_data("liu.zarr")
ds
<xarray.Dataset> Size: 874MB Dimensions: (lat: 5199, lon: 7000) Coordinates: * lat (lat) float64 42kB 44.8 44.8 44.8 44.8 ... 43.5 43.5 43.5 * lon (lon) float64 56kB -1.25 -1.25 -1.249 ... 0.4996 0.4999 Data variables: agb (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> canopy_cover (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> canopy_height (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> Attributes: date_modified: 2025-03-24T14:02:25.086565 geospatial_bounds: POLYGON((-18.9594916134861 50.18279655877187,... geospatial_bounds_crs: CRS84 geospatial_lat_max: 52.03426577988996 geospatial_lat_min: 50.18279655877187 geospatial_lat_resolution: 0.00034892335219183224 geospatial_lat_units: degrees_north geospatial_lon_max: -17.63688150306111 geospatial_lon_min: -18.9594916134861 geospatial_lon_resolution: 0.0002391080323462802 geospatial_lon_units: degrees_east
ds["agb"][::5, ::5].plot(vmax=20000)
<matplotlib.collections.QuadMesh at 0x7234b0426690>
ds["canopy_cover"][::5, ::5].plot()
<matplotlib.collections.QuadMesh at 0x7234b02d3320>
ds["canopy_height"][::5, ::5].plot()
<matplotlib.collections.QuadMesh at 0x7234b0fde240>
ds = msds.stores.storage.open_data("senf.zarr")
ds
<xarray.Dataset> Size: 47GB Dimensions: (time: 39, lat: 5199, lon: 7000) Coordinates: * lat (lat) float64 42kB 44.8 44.8 ... 43.5 43.5 * lon (lon) float64 56kB -1.25 -1.25 ... 0.4999 * time (time) datetime64[ns] 312B 1985-01-01 ... 2... Data variables: annual_disturbances (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray> disturbance_agent (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray> disturbance_agent_aggregated (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> disturbance_probability (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray> disturbance_severity (time, lat, lon) float64 11GB dask.array<chunksize=(1, 1000, 1000), meta=np.ndarray> forest_mask (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> greatest_disturbance (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> latest_disturbance (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> number_disturbances (lat, lon) float64 291MB dask.array<chunksize=(1000, 1000), meta=np.ndarray> Attributes: date_modified: 2025-03-25T06:53:43.878410 geospatial_bounds: POLYGON((-18.95915690343474 50.18273187952879... geospatial_bounds_crs: CRS84 geospatial_lat_max: 52.03419759001479 geospatial_lat_min: 50.182731879528795 geospatial_lat_resolution: 0.0003489227219688473 geospatial_lat_units: degrees_north geospatial_lon_max: -17.6365344971688 geospatial_lon_min: -18.95915690343474 geospatial_lon_resolution: 0.00023911036652535245 geospatial_lon_units: degrees_east
ds["annual_disturbances"].isel(time=10)[::5, ::5].plot()
<matplotlib.collections.QuadMesh at 0x7234b0559790>
ds["disturbance_probability"].isel(time=10)[::5, ::5].plot()
<matplotlib.collections.QuadMesh at 0x7234b0cbb320>