Exploring Data¶

This notebook demonstrates how to store sample data in the xcube geoDB, and shows basic exploration possibilities.

In [1]:

Copied!

from xcube_geodb.core.geodb import GeoDBClient
from xcube_geodb.core.geodb import GeoDBClient

Install xcube geoDB with command:

conda install xcube_geodb -c conda-forge

In [20]:

Copied!





### uncomment if not in managed environment
#client_id=YourID
#client_secret=YourSecret
#geodb = GeoDBClient(client_id=client_id, client_secret=client_secret, auth_mode="client-credentials")
### uncomment if not in managed environment
#client_id=YourID
#client_secret=YourSecret
#geodb = GeoDBClient(client_id=client_id, client_secret=client_secret, auth_mode="client-credentials")

The environment is prepared with your user credentials, so you simply can start the client.

In [ ]:

Copied!

geodb = GeoDBClient()
geodb = GeoDBClient()

Get your user name¶

In [3]:

Copied!

geodb.whoami
geodb.whoami

Out[3]:

'geodb_ci_test_user'

In [4]:

Copied!

geodb.get_my_collections()
geodb.get_my_collections()

Out[4]:

	owner	database	table_name
0	geodb_9bfgsdfg-453f-445b-a459	geodb_9bfgsdfg-453f-445b-a459	land_use
1	tt	tt	tt300

In [5]:

Copied!





import geopandas

collections = {
        "land_use":
        {
            "crs": 3794,
            "properties":
            {
                "RABA_PID": "float",
                "RABA_ID": "float",
                "D_OD": "date"
            }
        }
    }


geodb.create_collections(collections, clear=True)
import geopandas

collections = {
        "land_use":
        {
            "crs": 3794,
            "properties":
            {
                "RABA_PID": "float",
                "RABA_ID": "float",
                "D_OD": "date"
            }
        }
    }


geodb.create_collections(collections, clear=True)

Out[5]:

{'collections': {'geodb_ci_test_user_land_use': {'crs': 3794,
                                                 'properties': {'D_OD': 'date',
                                                                'RABA_ID': 'float',
                                                                'RABA_PID': 'float'}}}}

In [6]:

Copied!

gdf = geopandas.read_file('data/sample/land_use.shp')
geodb.insert_into_collection('land_use', gdf.iloc[:100,:]) # minimizing rows to 100, if you are in EDC, you dont need to make the subset.
gdf = geopandas.read_file('data/sample/land_use.shp')
geodb.insert_into_collection('land_use', gdf.iloc[:100,:]) # minimizing rows to 100, if you are in EDC, you dont need to make the subset.

Processing rows from 0 to 100

Out[6]:

100 rows inserted into land_use

List Datasets¶

Step 1: List all datasets a user has access to.

In [7]:

Copied!

geodb.get_my_usage() # to be updated so that all available collections are displayed includign sensible information ont heir availability, e.g. public, purchased, etc..
geodb.get_my_usage() # to be updated so that all available collections are displayed includign sensible information ont heir availability, e.g. public, purchased, etc..

Out[7]:

{'usage': '96 kB'}

In [8]:

Copied!

geodb.get_my_collections()
geodb.get_my_collections()

Out[8]:

	owner	database	table_name
0	geodb_9bfgsdfg-453f-445b-a459	geodb_9bfgsdfg-453f-445b-a459	land_use
1	geodb_ci_test_user	geodb_ci_test_user	land_use
2	tt	tt	tt300

Step 2: Let's get the whole content of a particular data set.

In [9]:

Copied!

gdf = geodb.get_collection('land_use') # to be updated, so that namespace is not needed or something more suitable, e.g. 'public'
gdf
gdf = geodb.get_collection('land_use') # to be updated, so that namespace is not needed or something more suitable, e.g. 'public'
gdf

Out[9]:

	id	created_at	modified_at	geometry	raba_pid	raba_id	d_od
0	1	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((453952.629 91124.177, 453952.696 911...	4770326	1410	2019-03-26
1	2	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((453810.376 91150.199, 453812.552 911...	4770325	1300	2019-03-26
2	3	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((456099.635 97696.070, 456112.810 976...	2305689	7000	2019-02-25
3	4	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((455929.405 97963.785, 455933.284 979...	2305596	1100	2019-02-25
4	5	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((461561.512 96119.256, 461632.114 960...	2310160	1100	2019-03-11
...	...	...	...	...	...	...	...
95	96	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458514.067 93026.352, 458513.306 930...	5960564	1600	2019-01-11
96	97	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458259.239 93110.981, 458259.022 931...	5960569	3000	2019-01-11
97	98	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458199.608 93099.296, 458199.825 930...	5960630	3000	2019-01-11
98	99	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458189.403 93071.618, 458179.669 930...	5960648	1100	2019-01-11
99	100	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((454901.777 95801.099, 454889.964 958...	2353305	1321	2019-01-05

100 rows × 7 columns

Step 3: Plot the GeoDataframe, select a reasonable column to diplay

In [10]:

Copied!

gdf.plot(column="raba_id", figsize=(15,15), cmap = 'jet')
gdf.plot(column="raba_id", figsize=(15,15), cmap = 'jet')

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f323b4fa190>

No description has been provided for this image

Step 5: Subselect the data. Here: Select a specific use by defining an ID value to choose

In [11]:

Copied!

gdfsub = geodb.get_collection('land_use', query='raba_id=eq.1410')
gdfsub.head()
gdfsub = geodb.get_collection('land_use', query='raba_id=eq.1410')
gdfsub.head()

Out[11]:

	id	created_at	modified_at	geometry	raba_pid	raba_id	d_od
0	1	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((453952.629 91124.177, 453952.696 911...	4770326	1410	2019-03-26
1	62	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((457261.001 96349.254, 457256.831 963...	3596498	1410	2019-01-05
2	22	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((455384.809 97907.054, 455380.659 979...	3616776	1410	2019-02-25
3	28	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((462585.734 93088.987, 462567.020 930...	3826126	1410	2019-01-23
4	32	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((457748.827 96167.354, 457748.394 961...	2309744	1410	2019-01-05

In [12]:

Copied!

gdfsub.plot(column="raba_id", figsize=(15,15), cmap = 'jet')
gdfsub.plot(column="raba_id", figsize=(15,15), cmap = 'jet')

Out[12]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f323b199190>

Step 6: Filter by bbox, limit it to 200 entries

In [13]:

Copied!

gdf = geodb.get_collection_by_bbox(collection="land_use",  bbox = (452750.0, 88909.549, 464000.0, 102486.299), comparison_mode="contains", bbox_crs=3794, limit=200, offset=10)
gdf
gdf = geodb.get_collection_by_bbox(collection="land_use",  bbox = (452750.0, 88909.549, 464000.0, 102486.299), comparison_mode="contains", bbox_crs=3794, limit=200, offset=10)
gdf

Out[13]:

	id	created_at	modified_at	geometry	raba_pid	raba_id	d_od
0	11	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((460137.998 95628.898, 460111.001 956...	5983161	1100	2019-03-11
1	12	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((453673.609 91328.224, 453678.929 913...	5983074	1600	2019-03-26
2	13	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((460312.295 96127.114, 460300.319 961...	5983199	1600	2019-03-11
3	14	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((460459.445 96117.356, 460470.516 961...	5983217	1100	2019-03-11
4	15	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((457798.753 99628.982, 457783.076 996...	6299143	1600	2019-03-04
...	...	...	...	...	...	...	...
85	96	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458514.067 93026.352, 458513.306 930...	5960564	1600	2019-01-11
86	97	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458259.239 93110.981, 458259.022 931...	5960569	3000	2019-01-11
87	98	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458199.608 93099.296, 458199.825 930...	5960630	3000	2019-01-11
88	99	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((458189.403 93071.618, 458179.669 930...	5960648	1100	2019-01-11
89	100	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((454901.777 95801.099, 454889.964 958...	2353305	1321	2019-01-05

90 rows × 7 columns

In [14]:

Copied!

gdf.plot(column="raba_pid", figsize=(15,15), cmap = 'jet')
gdf.plot(column="raba_pid", figsize=(15,15), cmap = 'jet')

Out[14]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f323b11b090>

Step 6: Fltering using PostGres Syntax; see https://www.postgresql.org/docs/9.1/index.html for details

In [15]:

Copied!

gdf = geodb.get_collection_pg(collection='land_use', where='raba_id=1410')
gdf.head()
gdf = geodb.get_collection_pg(collection='land_use', where='raba_id=1410')
gdf.head()

Out[15]:

	id	created_at	modified_at	geometry	raba_pid	raba_id	d_od
0	1	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((453952.629 91124.177, 453952.696 911...	4770326	1410	2019-03-26
1	62	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((457261.001 96349.254, 457256.831 963...	3596498	1410	2019-01-05
2	22	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((455384.809 97907.054, 455380.659 979...	3616776	1410	2019-02-25
3	28	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((462585.734 93088.987, 462567.020 930...	3826126	1410	2019-01-23
4	32	2021-01-22T10:02:34.390867+00:00	None	POLYGON ((457748.827 96167.354, 457748.394 961...	2309744	1410	2019-01-05

In [16]:

Copied!

gdf.plot(column="raba_pid", figsize=(15,15), cmap = 'jet')
gdf.plot(column="raba_pid", figsize=(15,15), cmap = 'jet')

Out[16]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f323b13a9d0>

Step 7: Fltering using PostGres Syntax Allowing Aggregation Here according to data, note that the data set has been reduced to 200 entries above

In [17]:

Copied!

df = geodb.get_collection_pg('land_use', where='raba_id=1410', group='d_od', select='COUNT(d_od) as ct, d_od')
df.head()
df = geodb.get_collection_pg('land_use', where='raba_id=1410', group='d_od', select='COUNT(d_od) as ct, d_od')
df.head()

Out[17]:

	ct	d_od
0	1	2019-03-04
1	1	2019-03-20
2	1	2019-03-26
3	1	2019-04-01
4	1	2019-02-25

In [18]:

Copied!

geodb.drop_collection('land_use')
geodb.drop_collection('land_use')

Out[18]:

Collection ['geodb_ci_test_user_land_use'] deleted

In [ ]:

Exploring Data¶

Login from any machine¶

Login in managed environment¶

Get your user name¶

List Datasets¶