
xcube Multi-Source Data Store
xcube-multistore
is a Python package designed to create a Multi-Source Data Store
that enables the seamless integration of data from multiple sources into a unified
data model. This approach simplifies the data fusion process while ensuring
transparency and reproducibility through well-defined configurations.
The package utilizes xcube’s data access, implemented via data store plugins, along with additional functionalities from xcube, to manipulate and harmonize datasets according to user-defined specifications.
The workflow includes the following steps:
- Data access through xcube data stores
- Data harmonization (e.g. subset, resample, reproject a dataset)
- Optional data fusion (e.g. combining multiple data sources into one data cube)
This process results in either a single, unified data cube with all datasets aligned to a consistent grid or a catalog of separate datasets.
Overview
The Multi-Source Data Store is configured via a YAML file. You can find an example configuration in examples/config.yml.
For more detailed guidance on creating a configuration file, please refer to the Configuration Guide.
Once the configuration file is ready, the Multi-Source Data Store can be started with a single line of code, as shown below:
from xcube_multistore.multistore import MultiSourceDataStore
msds = MultiSourceDataStore("config.yml")
For further examples please view the examples folder.
Features
IMPORTANT:
Thexcube-multistore
package is currently in the early stages of development.
The following features are available so far:
- subset of dataset (defined by grid mapping)
- resample and reproject dataset (defined by grid mapping)
- grid mapping may be defined by the user or by a dataset
- allow for time series at a single spatial point; interpolate the neighbouring points
- allow data fusion, where data variables in one
xr.Dataset
refers to different data sources - support spatial cutout of an area around a defined spatial point.
- support preload API for xcube-clms and xcube-zendoo
- allow to write to netcdf and zarr
The following features will be implemented in the future:
- some auxiliary functionalities which shall help to setup a config YAML file.
- interpolate along the time axis
License
The package is open source and released under the
MIT license.