Creation of a STAC HATEOAS Provider. (#857)

This commit is contained in:
ychoquet
2022-02-23 17:27:43 -05:00
committed by GitHub
parent 30f0399282
commit bd2177674c
6 changed files with 513 additions and 11 deletions
+297 -5
View File
@@ -1,13 +1,305 @@
.. _stac:
Publishing files to a SpatioTemporal Asset Catalog
==================================================
**************************************************
The `SpatioTemporal Asset Catalog (STAC)`_ specification provides an easy approach
for describing geospatial assets. STAC is typically implemented for imagery and
other raster data.
The `SpatioTemporal Asset Catalog (STAC)`_ family of specifications aim to standardize
the way geospatial asset metadata is structured and queried. A "spatiotemporal asset"
is any file that represents information about the Earth at a certain place and time.
The original focus was on scenes of satellite imagery, but the specifications now cover
a broad variety of uses, including sources such as aircraft and drone and data such as
hyperspectral optical, synthetic aperture radar (SAR), video, point clouds, lidar, digital
elevation models (DEM), vector, machine learning labels, and composites like NDVI and
mosaics. STAC is intentionally designed with a minimal core and flexible extension mechanism
to support a broad set of use cases. This specification has matured over the past several
years, and is used in numerous production deployments.
pygeoapi implements STAC as an geospatial file browser through the FileSystem provider,
pygeoapi has two built-in providers to browse STAC catalogs: `FileSystem Provider`_ and
`Hateoas Provider`_.
Hateoas Provider
================
HATEOAS (Hypermedia as the Engine of Application State) is a way of implementing a REST
application that allows the client to dynamically navigate to the appropriate resources
by browsing hypermedia links. This type of navigation is similar to WEB navigation
and requires a very precise data structure that must be respected to allow the HATEOAS
Provider to behave correctly.
There are three component specifications (Catalog, Collection, Item) that together make
up the core SpatioTemporal Asset Catalog specification. An Item represents a single
spatiotemporal asset as GeoJSON. The Catalog specification provides structural elements,
to group Items and Collections. Collections are catalogs, that add more required metadata
and describe a group of related Items.
The full catalog structure of links down to sub-catalogs and Items, and their links back to
their parents and roots, must be done with **relative** URL's for the HATEOAS Provider work
correctly. The structural *rel* types include *root*, *parent*, *child*, *item*, and
*collection*. Assets links must be **absolute** URL's. Other links can be absolute, especially
if they describe a resource that makes less sense in the catalog, like derived_from or even
license (it can be nice to include the license in the catalog, but some licenses live at a
canonical online location which makes more sense to refer to directly). This enables the
full catalog (excluding the assets) to be downloaded or copied to another location and to
still be valid. This also implies no self link, as that link must be absolute.
So, the following rules must be respected:
1. Root documents (Catalogs / Collections) must be at the root of a directory tree containing the static catalog.
2. Catalogs must be named catalog.json and Collections must be named collection.json.
3. Sub-Catalogs or sub-Collections must be stored in subdirectories of their parent (and only 1 subdirectory deeper than a document's parent, e.g. .../sample/sub1/catalog.json).
4. Limit the number of Items in a Catalog or Collection, grouping / partitioning as relevant to the dataset.
5. Use structural elements (Catalog and Collection) consistently across each 'level' of your hierarchy. For example, if levels 2 and 4 of the hierarchy only contain Collections, don't add a Catalog at levels 2 and 4.
6. Items must be named <*id*>.json.
7. Items must be stored in subdirectories (1 level deeper) of their parent Catalog or Collection. The subdirectory must have the same name (<*id*>) as the Item without the *.json* extension. This means that each Item are contained in a unique subdirectory.
8. The links to the actual assets must be an absolute URL.
-------------
File examples
-------------
**Structure of the catalog.json file**
.. code-block:: json
{
"id": "STAC-Catalog",
"stac_version": "1.0.0",
"description": "A description of the STAC Catalog",
"links": [
{
"rel": "root",
"href": "./catalog.json",
"type": "application/json"
},
{
"rel": "child",
"href": "./eo4ce/catalog.json",
"type": "application/json"
},
...
{
"rel": "child",
"href": "./dem/catalog.json",
"type": "application/json"
}
],
"stac_extensions": [],
"title": "STAC Catalog"
}
The code above shows the root catalog. The sub-catalogs have an additional ``rel`` entry pointing to the parent.
.. code-block:: json
{
"id": "dem",
"stac_version": "1.0.0",
"description": "Digital Elevation Data",
"links": [
{
"rel": "root",
"href": "../catalog.json",
"type": "application/json"
},
{
"rel": "child",
"href": "./hrdsm/collection.json",
"type": "application/json"
},
{
"rel": "parent",
"href": "../catalog.json",
"type": "application/json"
}
],
"stac_extensions": [],
"title": "DEM"
}
-------------------------------------
**Structure of the collection.json file**
Collections are similar to Catalogs with extra fields.
.. code-block:: json
{
"id": "hrdsm",
"stac_version": "1.0.0",
"description": "High Resolution Digital Surface Model",
"links": [
{
"rel": "root",
"href": "../../catalog.json",
"type": "application/json"
},
{
"rel": "item",
"href": "./arcticdem-frontiere-0/arcticdem-frontiere-0.json",
"type": "application/json"
},
...
{
"rel": "item",
"href": "./arcticdem-frontiere-9/arcticdem-frontiere-9.json",
"type": "application/json"
},
{
"rel": "parent",
"href": "../catalog.json",
"type": "application/json"
}
],
"stac_extensions": [],
"extent": {
"spatial": {
"bbox": [
[
-142.76516601842533,
59.65274347822059,
-138.41658819177135,
69.81052152420365
]
]
},
"temporal": {
"interval": [
[
"2014-09-03T14:00:00Z",
"2020-09-28T15:49:00.559166Z"
]
]
}
},
"license": "proprietary"
}
-------------------------------------
**Structure of the Item <id>.json file**
The example below shows the content of a file named *arcticdem-frontiere-0.json*.
.. code-block:: json
{
"type": "Feature",
"stac_version": "1.0.0",
"id": "arcticdem-frontiere-0",
"properties": {
"layer:ids": [
"dem-hrdsm"
],
"collection": "hrdsm",
"datetime": "2020-09-28T15:48:56.483794Z"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-140.27389595735178,
59.65274347822059
],
[
-138.41658819177135,
59.65274347822059
],
[
-138.41658819177135,
60.579416456816496
],
[
-140.27389595735178,
60.579416456816496
],
[
-140.27389595735178,
59.65274347822059
]
]
]
},
"links": [
{
"rel": "root",
"href": "../../../catalog.json",
"type": "application/json"
},
{
"rel": "collection",
"href": "../collection.json",
"type": "application/json"
},
{
"rel": "parent",
"href": "../collection.json",
"type": "application/json"
}
],
"assets": {
"image": {
"href": "http://absolute/path/to/the/ressource/arcticdem-frontiere-0.tif",
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
"roles": []
}
},
"bbox": [
-140.27389595735178,
59.65274347822059,
-138.41658819177135,
60.579416456816496
],
"stac_extensions": [],
"collection": "hrdsm"
}
---------------------
HATEOAS Configuration
---------------------
Configuring HATEOAS STAC Provider in pygeoapi is done by simply pointing the ``data`` provider property
to the local directory or remote URL and specifying the root file name (catalog.json or collection.json) in the file_types property:
Connection examples
-------------------
.. code-block:: yaml
my-remote-stac-resource:
type: stac-collection
...
providers:
- type: stac
name: Hateoas
data: https://datacube-dev-data-public.s3.ca-central-1.amazonaws.com/catalog/water
file_types: catalog.json
my-local-stac-resource:
type: stac-collection
...
providers:
- type: stac
name: Hateoas
data: tests/stac
file_types: catalog.json
-------------------
FileSystem Provider
===================
The FileSystem Provider implements STAC as a geospatial file browser through the server's file system,
supporting any level of file/directory nesting/hierarchy.
Configuring STAC in pygeoapi is done by simply pointing the ``data`` provider property
+1
View File
@@ -46,6 +46,7 @@ PLUGINS = {
'SQLiteGPKG': 'pygeoapi.provider.sqlite.SQLiteGPKGProvider',
'MongoDB': 'pygeoapi.provider.mongo.MongoProvider',
'FileSystem': 'pygeoapi.provider.filesystem.FileSystemProvider',
'Hateoas': 'pygeoapi.provider.hateoas.HateoasProvider',
'rasterio': 'pygeoapi.provider.rasterio_.RasterioProvider',
'xarray': 'pygeoapi.provider.xarray_.XarrayProvider',
'MVT': 'pygeoapi.provider.mvt.MVTProvider',
+3 -1
View File
@@ -168,6 +168,7 @@ class FileSystemProvider(BaseProvider):
'href': newpath,
'type': 'text/html',
'created': filectime,
'entry:type': 'Catalog'
})
elif os.path.isfile(fullpath):
basename, extension = os.path.splitext(dc)
@@ -180,7 +181,8 @@ class FileSystemProvider(BaseProvider):
'href': newpath,
'title': get_path_basename(newpath2),
'created': filectime,
'file:size': filesize
'file:size': filesize,
'entry:type': 'Item'
})
# child_links.append({
# 'rel': 'item',
+205
View File
@@ -0,0 +1,205 @@
# =================================================================
#
# Authors: yves.choquette <yves.choquette@NRCan-RNCan.gc.ca>
#
# Copyright (c) 2022 Yves Choquette
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#
# =================================================================
import requests
import logging
import os
from urllib.parse import urljoin
import json
from pygeoapi.provider.base import (BaseProvider, ProviderNotFoundError)
LOGGER = logging.getLogger(__name__)
class HateoasProvider(BaseProvider):
"""HateoasProvider Provider"""
def __init__(self, provider_def):
"""
Initialize object
:param provider_def: provider definition
:returns: pygeoapi.provider.hateoas.HateoasProvider
"""
super().__init__(provider_def)
def get_data_path(self, baseurl, urlpath, entrypath):
"""
Gets directory listing or file description or raw file dump
:param baseurl: base URL of endpoint
:param urlpath: base path of URL
:param entrypath: basepath of the entry selected (equivalent of URL)
:returns: `dict` of catalogs/collections or `dict` of GeoJSON item
"""
thispath = os.path.join(baseurl, urlpath)
resource_type = None
root_link = None
child_links = []
data_path = self.data + entrypath
if '/' not in entrypath: # root
root_link = baseurl
else:
parentpath = urljoin(thispath, '.')
child_links.append({
'rel': 'parent',
'href': '{}?f=json'.format(parentpath),
'type': 'application/json'
})
child_links.append({
'rel': 'parent',
'href': parentpath,
'type': 'text/html'
})
depth = entrypath.count('/')
root_path = '/'.replace('/', '../' * depth, 1)
root_link = urljoin(thispath, root_path)
content = {
'links': [{
'rel': 'root',
'href': '{}?f=json'.format(root_link),
'type': 'application/json'
}, {
'rel': 'root',
'href': root_link,
'type': 'text/html'
}, {
'rel': 'self',
'href': '{}?f=json'.format(thispath),
'type': 'application/json',
}, {
'rel': 'self',
'href': thispath,
'type': 'text/html'
}
]
}
LOGGER.debug('Checking if path exists as Catalog, Collection or Asset')
try:
jsondata = _get_json_data('{}/catalog.json'.format(data_path))
resource_type = 'Catalog'
except Exception:
try:
jsondata = _get_json_data('{}/collection.json'.format(data_path)) # noqa
resource_type = 'Collection'
except Exception:
try:
filename = os.path.basename(data_path)
jsondata = _get_json_data('{}/{}.json'.format(data_path, filename)) # noqa
resource_type = 'Assets'
except Exception:
msg = 'Resource does not exist: {}'.format(data_path)
LOGGER.error(msg)
raise ProviderNotFoundError(msg)
if resource_type == 'Catalog' or resource_type == 'Collection':
content['type'] = resource_type
link_href_list = []
for link in jsondata["links"]:
if resource_type in ['Catalog', 'Collection'] \
and link["rel"] in ["child", "item"]:
link_href_list.append(link["href"].replace('\\', '/'))
link_href_list.sort()
for link in link_href_list:
unused, path_ending, entry_type = link.split('/')
newpath = os.path.join(baseurl, urlpath, path_ending).replace('\\', '/') # noqa
if entry_type == 'catalog.json':
child_links.append({
'rel': 'child',
'href': newpath,
'type': 'text/html',
'created': "-",
'entry:type': 'Catalog'
})
elif entry_type == 'collection.json':
child_links.append({
'rel': 'child',
'href': newpath,
'type': 'text/html',
'created': "-",
'entry:type': 'Collection'
})
else:
child_links.append({
'rel': 'item',
'href': newpath,
'title': path_ending,
'created': "-",
'entry:type': 'Item'
})
elif resource_type == 'Assets':
content = jsondata
content['assets']['default'] = {
'href': os.path.join(baseurl, urlpath).replace('\\', '/'),
}
for key in content['assets']:
content['assets'][key]['file:size'] = 0
content['assets'][key]['created'] = jsondata["properties"]["datetime"] # noqa
content['links'].extend(child_links)
return content
def __repr__(self):
return '<HateoasProvider> {}'.format(self.data)
def _get_json_data(jsonpath):
"""
Helper function used to load a json file that is located on the WEB
(HTTP request) or on the server file system
:param jsonpath: path to the json file
:returns: `dict` of JSON item
"""
if jsonpath[0:4].upper() == 'HTTP':
jsondata = requests.get(jsonpath).json()
else:
with open(jsonpath) as fh:
jsondata = json.load(fh)
return jsondata
+6 -4
View File
@@ -10,10 +10,11 @@
{% block body %}
<section id="links">
<h2>Links</h2>
<h2>{{ data['type'] }}: {{ data['path'] }}</h2>
<table class="striped">
<thead>
<tr>
<th>{% trans %}Type{% endtrans %}</th>
<th>{% trans %}Name{% endtrans %}</th>
<th>{% trans %}Last modified{% endtrans %}</th>
<th>{% trans %}Size{% endtrans %}</th>
@@ -23,16 +24,17 @@
{% for link in data['links'] %}
{% if link['rel'] in ['child', 'item'] %}
<tr>
<td data-label="name">
<td data-label="Type">{{ link['entry:type'] }}</td>
<td data-label="Name">
{% if link['title'] %}
<a title="{{ link['href'] }}" href="{{ link['href'] }}"><span>{{ link['title'] | get_path_basename }}</span></a>
{% else %}
<a title="{{ link['href'] }}" href="{{ link['href'] }}"><span>{{ link['href'] | get_path_basename }}</span></a>
{% endif %}
</td>
<td data-label="created">{{ link['created'] }}</td>
<td data-label="Created">{{ link['created'] }}</td>
{% if link['file:size'] %}
<td data-label="size">{{ link['file:size'] | human_size }}</td>
<td data-label="Size">{{ link['file:size'] | human_size }}</td>
{% else %}
<td data-label="size">-</td>
{% endif %}
+1 -1
View File
@@ -16,7 +16,7 @@
<section id="item">
<div class="row">
<div class="col-sm">
<h2>{% trans %}Item{% endtrans %} {{ data['id'] }}</h2>
<h2>{% trans %}Item{% endtrans %}: {{ data['id'] }}</h2>
</div>
</div>
<div class="row">