xcube Multi-Source Data Store
xcube-multistore is a Python package designed to create a Multi-Source Data Store
that enables the seamless integration of data from multiple sources into a unified
data model. This approach simplifies the data fusion process while ensuring
transparency and reproducibility through well-defined configurations.
The package utilizes xcube’s data access, implemented via data store plugins, along with additional functionalities from xcube, to manipulate and harmonize datasets according to user-defined specifications.
The workflow includes the following steps:
- Data access through xcube data stores
- Data harmonization (e.g. subset, resample, reproject a dataset)
- Optional data fusion (e.g. combining multiple data sources into one data cube)
This process results in either a single, unified data cube with all datasets aligned to a consistent grid or a catalog of separate datasets.
Overview
The Multi-Source Data Store is configured via a YAML file. You can find an example configuration in examples/config.yml.
For more detailed guidance on creating a configuration file, please refer to the Configuration Guide.
Once the configuration file is ready, the Multi-Source Data Store can be started with a single line of code, as shown below:
from xcube_multistore.multistore import MultiSourceDataStore
msds = MultiSourceDataStore("config.yml")
For further examples please view the examples folder.
Features
IMPORTANT:
Thexcube-multistorepackage is currently in the early stages of development.
The following features are available so far:
- subset of dataset (defined by grid mapping)
- resample and reproject dataset (defined by grid mapping)
- grid mapping may be defined by the user or by a dataset
- allow for time series at a single spatial point; interpolate the neighbouring points
- allow data fusion, where data variables in one
xr.Datasetrefers to different data sources - support spatial cutout of an area around a defined spatial point.
- support preload API for xcube-clms and xcube-zendoo
- allow to write to netcdf and zarr
- some auxiliary functionalities which shall help to setup a config YAML file.
- interpolate along the time axis
Configuration Generator GUI
The Configuration Generator GUI provides an interactive interface for creating and editing the configuration YAML, making the setup process more intuitive and less error-prone.
Key features (in development):
- Display of all available fields for each configuration section
- Dynamic fetching and updating of valid parameters and inputs
- Dropdown menus that show only supported options
- Autofill assistance for large option sets (e.g., thousands of data IDs)
- Built-in configuration validator/checker
- Geolocation visualization to help define bounding boxes
Note: This feature is under active development, and only a minimal working example is currently available.
To launch the GUI, run the following command from the package root:
panel serve xcube_multistore/gui/app.py --dev
License
The package is open source and released under the
MIT license.