03/10/2025

DestinE Platform Data Model of Data and Services

Interfaces

Reference: DEST-SRCO-TN-2400459

Version: 3.2

Author: DestinE Platform team

Download file:

1. Introduction

1.1 Purpose and Scope

This document describes the Data Model of DestinE Platform services, and its guidelines related to the "Destination Earth – DestinE Core Service Platform (DestinE Platform) Framework – Platform & Data Management Services".

This document proposes a well-defined data-entity schema for describing DestinE Platform data assets exposed by services specifying metadata, properties, and attributes related to their main characteristics and functionalities. They serve not only to shape how information is presented to end users but also to establish guidelines for organizing inputs from relevant stakeholders.

The data model depicted in this document outlines the context of the DestinE Service Registry, providing a comprehensive schema for describing services registered to the platform.

The audience of this document is DestinE Platform Administrators and Service Providers.

This document will undergo periodic revisions to incorporate new metadata and requirements, enhancing our ability to identify services and data in accordance with the evolution roadmap.

1.2 Objectives and Drivers

The main objectives of this document are:

•Facilitate user access to DestinE Platform services and data assets while ensuring efficient and structured information retrieval.

•Support stakeholders in the service onboarding process by providing guidelines for registering their services and related data portfolios.

To achieve these objectives, the following drivers have been identified.

Driver 1) Identify services and data portfolio properties and attributes and efficiently organize them in a well-defined data-entity model oriented to operations.

Driver 2) Streamline the input collection process by Service Providers willing to integrate a service and related data portfolio. The gathered information is registered and organized in the Service Registry. The metadata schema required by the Service Registry and identified in this document shall be made explicit to Service Providers and documented for general service/data onboarding guidance.

Driver 3) Design a comprehensive "system of keywords" to identify services and data features based on their intrinsic functionality or application. These keywords are designed to be queryable by end users, enabling them to search and filter information effectively.

Driver 4) Blueprint the data-entity schema and its keywords to enable the following user scenarios.

oProvide service information and data portfolio information (if applicable) in a web form available to any user willing to become a Service Provider

oCategorize and filter services based on their attributes, properties, and relevant keywords.

oSearch services by keyword, attribute or property and be returned to the list of services responding to that search criteria.

oSearch data by keyword, attribute, or property and return it to the list of DestinE Platform services, exposing data that responds to that search criteria.

1.3 Approach

The approach adopted in designing the DestinE platform Service Data Model undergoes the following phases.

•Conceptual Modelling: this phase comprehends the gathering of Service requirements and their translation into data entity definitions, relationships, and attributes.

•Logical Modelling: this phase is defined as applying data model patterns and requirements, ensuring that the Data Model appropriately adapts the DestinE Platform operational framework. This step considers possible evolutions and re-adaptations of the data model.

•Physical Modelling: This step ensures the technical requirements are met and the design solution is implemented.

•

1.4 Reference Documents

Ref.	Title	Reference and Version
[RD-1]	SentiWiki (Sentinel Applications)	https://sentiwiki.copernicus.eu/web/copernicus-programme
[RD-2]	Eumetsat User Portal (Using data)	https://user.eumetsat.int/data/themes
[RD-3]	Eurostat Data Browser	https://ec.europa.eu/eurostat/databrowser/explore/all/all_themes?lang=en&display=list&sort=category
[RD-4]	Copernicus Climate Change (Data section)	https://climate.copernicus.eu/
[RD-5]	Copernicus Marine (Data Store)	https://data.marine.copernicus.eu/products
[RD-6]	EEA Datahub (EEA topics)	https://www.eea.europa.eu/en/datahub
[RD-7]	Destination Earth (Use cases Topic)	https://destination-earth.eu/use-cases/

1.5 Acronyms and Abbreviations

Acronym	Definition
API	Application Programming Interface
ECSS	European Cooperation for Space Standardization
ESA	European Space Agency
GDPR	General Data Protection Regulation
IAM	Identity and Access Management
LT Monitoring	Long-Term Monitoring
RT Monitoring	Real-Time Monitoring

1.6 Glossary

Term	Definition
Attribute	Information that describes a property or characteristic of an entity, supporting in distinguishing the entity within a system. Attributes can be labelled with further information indicating a more specific characteristic within the system.
Data Portfolio	Collection of data resources owned, managed and utilized by a service. It encompasses various types of structured and unstructured data from multiple sources.
Entity	Distinct object, concept, or thing about which data is stored.
Entry	A data portfolio entry refers to a specific data resource as a distinct consumable item or value.
Marketplace: Service Catalogue or Data Catalogue	The service and data catalogue are both exposed to end users and centralized on the DestinE platform, to ease exploration, engagement, with a wide array of service offerings tailored to their specific needs and interests. In the service catalogue, services are showcased grouped by categories for easy navigation and selection. Users browse and access various services, from freelancing to professional expertise, facilitating transactions and collaborations. In the data catalogue, the focus shifts to datasets and insights presented in a data-centric format. Alongside, services offering functionalities over these datasets are listed, providing users with tools and solutions to leverage the data effectively.
Key	Field or combination of fields within a data entity that uniquely identifies each instance of that entity. Every key has a proper name and indicates a definite concept within the data model schema.
Queryable	It refers to any object or entity within the Service Registry database schema that can be queried or used as a filter by an external system or service.
Searchable	Any data element or field within the Service Registry database that can be searched by users in a front-end, using the full-text search bar.
Value	Actual piece of data stored within an attribute of an entity or record. It represents the information associated with a particular attribute for a specific entity instance.

1.7 Attribute provisioning conventions

The Data Model presented in this document shows attributes which differ in the way they are provided.

•Mandatory attributes: they are functional for this Data Model, and they are always required as input data. There could be:

oStandardized attributes: information that is predefined in the system and "recognized" during the registration process on the Service Registry.

Note: If the entry is not present in the standard options, the entry shall be registered by the service provider and undergoes a process of approval in agreement with the CoC and guidelines.

oAutomatized attributes: if already present in the system, this field is automatically compiled.

•Free text attributes: information that is not known in advance and it is provisioned as free text.

•Optional or Suggested attributes: information that is not required but recommended. Providing optional details enhances the visibility of a service.

2. Operational Context and Stakeholders

The data model is within the scope of the DestinE Platform Service Registry and DestinE Platform Web Portal Services, which define a data-entity model within their operational framework.

Figure 1: Operational context of the Service Data Model.

The Operational context is split into the following logical areas.

•Guidelines/Data Model – the Data Model defines the Service Registry data-entity schema.

•Service Registry/System – ingests and implements the Data Model, enabling user scenarios by means of its backend and frontend components.

oBackend: The backend hosts a database and middleware that support both registration (write) and discovery (read) scenarios. It also includes a console that functions as a back-office tool for managing service onboarding.

oFrontend: The frontend presents the Data Model to relevant stakeholders, drawing from the information stored in the database. It exposes key attributes and properties through a search engine, enabling efficient access and exploration of the data

•Stakeholders – actors participating in the operational context with the following scenarios.

oDestinE Platform Service Providers: they are responsible for providing the required information about their service within the service onboarding process.

oDestinE Platform Standard registered or federated users: they can search, discover and filter key attributes and properties of services and data as exposed by the Service Registry.

3. User Scenarios

Scenarios from a user’s perspective can be grouped into the following categories related to the Service Registry.

•Enhancement of Platform navigability: the information contained in the Data Model populates the DestinE platform Web Portal service, enhancing the user experience in visualizing the data and service information per pages.

•Search/filter contents of interest: the information contained in the Data Model can be searched or queried by filters and keywords and linked to multiple data and service information at time.

•Personalized/Pushed contents: using analytics by combining the information contained in the Data Model and the information upon users, the platform can push personalized contents or suggestions to registered users about similar services data, and applications aimed at the same purpose.

And, in terms of back office

•Onboarding: all services being part of the DestinE platform ecosystem are registered in the Service Registry

3.1 Platform navigability

The user’s experience on the web portal is greatly influenced by the platform’s navigability. Therefore, it is crucial to organize the contents intelligently.

Contents contains the information of the Data Model and are placed on the front-end/Web Portal into at least the following sections:

•Service Catalogue: On this page, services are organized according to the categories selected during the registration activity. These categories can be Service categories or Data categories. Using metrics from the dashboard activity, other categories are foreseen such as popular services, most viewed, or similar. Different categories will be added based on the User Registry, such as “Suggested for you,” “Similar contents,” etc.

•Data Catalogue: Pages dedicated to group services sharing the same data. This part is generated for all levels of the data portfolio hierarchy. This concept of data hierarchy is described throughout the sections dedicated to the Service Data Portfolio.

•Service’s own page: This page is dedicated to each service, where the service presents its functionalities, strengths and portfolio. The more information the provider fills in the form, the richer the personal service page on the web portal will be with content and information that appeals to users.

3.2 Search contents of interest

Users can search all fields reported in this Data Model in the Web Portal, searching-by keywords or piece of sentences.

The following user scenarios have been outlined.

3.2.1 Free-text search

Scope	Item	Description of the scenario	Examples
Service	Service Name	The user enters the complete name of a service or a portion of it	“insula”, “cache”
Service	Service Description	The user enters a word or a sentence that is part of service description	“data access”, “cloud-based computing”, “processing”
Data Portfolio	Program name	The user enters the name of a registered program	“Copernicus”, “destination earth”
	Program subset name	The user enters the name of a registered program subset	“sentinel”, “climate”, “weather extremes”
	Collection name	The user enters the name of a registered collection or a portion of it or a known acronym	“sentinel-1”, “IFS”, “scenariocmip”, “ERA5”
	Dataset name	The user enters the name of a registered dataset or a portion of it	“ERA5 hourly data on single-levels”, “SLC”, “mosaic”
	Dataset short name	The user enters the short name of the dataset as registered in the Registry	“reanalysis-era5-single-levels”, “0001-high-sfc”
	Variable name	The user enters the name of a variable or a portion of it	“snow depth”, “temperature”, “wind”

3.2.2 Field-specific filtering and multi-level filtering

Scope	Item	Description of the scenario	Examples
Service	Service Category	The user selects one or more service categories	“Data Access”, “Data Visualization”, “User Workflow”
	Tag (Primary)	The user selects one or more tags describing the service	“Cache” and “STAC” and “S3” will define the Data Cache Management Service
	Tag (Secondary)	The user selects one or more tags describing the service	“Notebook” “Data Lake” …
	Organization Name	The user selects one of the listed organizations providing services (Service Provider)	“CGI” “MEEO”
	User Availability	The user selects a toggle on the availability of a service with respect to the user profile	“Free”, “Restricted”
	Interface Name	The user selects the full name of a given interface	“Finder”, “STAC”
	Interface Type	The user selects the archetype of a given interface	“GUI”, “API”, “CLI”, “IDE”
Data	Program Subset	The user selects a Program Subset as indexed in the Registry	“Sentinel”, “DestinE Climate Adaptation Digital Twins”, “Climate Change”
	Collection	The user selects a Collection as indexed in the Registry	“Sentinel-1”, “ERA5”
	Dataset Category	The user selects a Dataset category as indexed in the Registry	“Climate Dynamics and Forecasting”, “Atmospheric Composition and Health”
	Variable Tag	The user selects a tag associated to the variables as indexed in the Registry	“Aerosol”, “Plume”, “Climate”, “Global Warming”
	Dataset Type	The user selects a dataset type as indexed in the Registry	“Satellite Observations”, “Computational Model”, “IoT”

3.3 Personalized contents

Note: This is tracked as an evolution.

The user story consists of:

•Follow a tag: on the front-end the user can select a tag, related to a service or a dataset

•Monitoring/User Registry: the user’s preferences are saved in the scope of a User Registry and analyzed

The personalized content is the result of the data gathered by preferences are used for aligning content delivery with user objectives.

In a first instance, the content is “recommended” (considering a personalization on all users). The information is provided by the Platform Monitoring Service and based on rating surveys data.

3.4 Service Onboarding

This scenario consists in registering all the service-related information according to the Data Model presented in this document and using the guidelines in “Annex 1.”

Service information is registered in the Service Registry as an operational procedure.

4. Data Model

The Data Model schema presented in this section is a comprehensive blueprint for structuring the diverse services offered within the Destination Earth platform.

The overarching architecture of the data model presented in this document is divided into the following elements:

•User Registry, which manages the profiles of users who want to utilize the platform's features. It is furtherly expanded in Section 4.1.

•Service Registry, which maintains essential information for service stakeholders interested in integrating their offerings into the DestinE platform. It is detailed in Section 4.2.

•Data Portfolio being an entry of the Service Registry, but due to its complexity it is expanded in a dedicated Section 4.2.4.

4.1 User Registry

The User registry serves as a repository of essential user-related data pivotal for the operational functionality of the platform. It can be delineated into the following attributes, that are provided by the IAM Service:

1.General User Information (free): This encompasses fundamental details such as the user's name and email address, serving as foundational identifiers within the system.

2.User Category (automatized): It delineates users' classification and associated privileges within the platform. This includes permissions of access data and resources.

3.User Profile (suggested): Set of attributes that users can optionally insert during the registration process or edit within their own profile area. This metadata characterizes the user’s profile by adding information on:

•Country of provenance (Italy, France, Ireland, …)

•Declared usage profile (Public Sector, Private Sector, Research, Education, ESA / ECMWF / EUMETSAT staff, ESA / ECMWF / EUMETSAT contractors, Media and Public Relations, NGOs and other non-profit entities, Citizens, Other (free text))

4.Follow (suggested): While using the platform, users can choose to follow specific tags related to data and services as areas of interest. This action enables personalized content delivery, pushing content that aligns with their preferences and objectives.

•Service Categories: The user can choose from the existing service categories.

•Data Categories: the user can choose among the existing data categories.

•Data Tags: the user can choose among the existing data tags to describe data applications.

•Collections or datasets: the user can directly choose a preference on single collections or datasets.

The elements outlined above are within the scope of the Service Registry and Data Portfolio.

4.2 Service Description

The service registry covers all the information related to DestinE Platform services to facilitate their organization, discovery, and utilization. It serves as a central directory where services and related information are catalogued .

The service registry is divided into three types of attributes as follows.

1.General Information: set of attributes that uniquely identify and describe a service. Please refer to Section 4.2.1 for details.

2.Service Interface: set of attributes relative to the endpoints through which the service can be accessed and interacted by DestinE Platform users. Interfaces can adhere to different paradigms or standards. Please refer to Section 4.2.2 for details.

3.Data Portfolio: set of attributes describing the data offer of a Service. Due to the complexity of the Data Model related to the Data Portfolio, its details are provided separately in Section 4.2.4.

4.2.1 General attributes

The following metadata characterize the general service information.

•Name: the name of the service must be defined by the service owner. The name attribute is used on the web portal (in both the service dedicated page and the data marketplace) and can be searched by the user.

•Description: A description of the service's primary purposes. The description of the Service is used on the service's dedicated page on the web portal, and the service owner must insert it. Users can also search for the text using keywords.

•Cover Image: the cover image used for the service's presentation on the web portal. It is utilized in more places across the web portal with variable dimensions, e.g., the service page, main service menu, similar content list, etc.

•Logo: The service logo is used to identify the service within the web portal. It is requested on a colour scale and white monochrome.

•Short Description: A less detailed description is used for the general service presentation marketplace and is searchable by keywords.

•Service Page Link: the primary link to the service page, used on the service's dedicated page.

•Gallery: a set of images relevant to the service used in the service's dedicated page on the web portal in a carousel. Every image in the set is a Gallery Item.

•Video Tutorial: a set of video tutorials relevant to the service, used in the service's dedicated page on the web portal as additional information and user support. Every Video in this set is a Video Tutorial.

•Organization Information: for each organization that owns the service, the following information needs to be defined (it is foreseen that most of the services belong to only one organization):

oOrganization Name: the name of the organization or company that provides the service. It must be defined by the service owner. The name attribute is used on the web portal in the service dedicated page and can be searched by the user.

oOrganization Type: the organization type that can be chosen from the available list (industry, research, public, scientific, commercial, education, and other). This information is not exposed in the web portal but is used for monitoring.

oService Contact address: the email of the organization contact. This information is not exposed in the web portal but is used for back-office maintenance.

oOrganization Logo: the service logo identifies the service within the web portal. If provided, it is requested in colour scale and white monochrome.

oOrganization reference: URL pointing to the organization or company website.

•Category: the service owner is responsible for defining the category (one or more) to which the service belongs. The organization offers a set of predefined categories for this purpose. If the service doesn't fall under any existing categories, the owner can request to create a new one. This process is regulated by version control. Categories are used to organize services on the web portal based on their primary purpose and functionalities. The list of predefined categories is provided in Annex1. Categories are also keywords, i.e. queryable items in the web portal.

•Tag: Tags describe the service's relevant features or attributes that help further group services in the web portal. The service owner can choose from the initial set of tags, but they can request to create a new tag if it's not present in the initial list. Therefore, a version control is also foreseen for the tags. The list of predefined tags is detailed in Annex1. Tags are also keywords, i.e. queryable items in the web portal. In the data model, there exist two types of tags:

oPrimary tags: the service provider shall choose a maximum of 3 tags to show up in the web portal services marketplace (next to the service element).

oSecondary tags: a general list of service tags appearing on the web portal's dedicated service page. There is no limit to the number of tags that can be chosen. Note that in this list, there are also tags that help identify service interfaces.

•Documentation: Documentation of the service. In the web portal, it is foreseen that a section in which the service documentation is presented in Readthedocs standard.

•User Availability: identifies the availability of the service to users according to free or restricted access. It is used on the web portal to filter the service usage depending on the user's role.

•Service Version: Version of the service as delivered in production.

Here below some fields that are aimed at internal-use-only of the present Data Model.

•Service quota: this metadata defines the limitations and policies defined by the Service

oLimit on requests: Quota restricting the number of requests a user can make per month.

oConcurrency limits: Quota limiting the number of concurrent connections a user can initiate per month.

oTime-based access control: Quota that restricts access to resources based on time constraints, i.e. access rights are granted or revoked after a time frame.

oComputational or Processing Limits: Limitation on the provisioning of computational resources (e.g., CPU, memory, or GPU) that the user can instance into the service per month.

oAPI Request or Response size limits: Quota limiting the size of data a user can request or send in a single API call.

oData Transfer limits: Quota limiting the amount of data a user can transfer (upload/download) on the service per month.

oFeature-based/Custom limits: Restricts access to specific features based on the user’s plan.

•Service Cloud Type: Referred to the Cloud solution as per 1) runtime, 2) OVH account or 3) External infrastructure.

•Service Issue Date: datetime of service “publication” in the registry, after a successful onboarding.

•Operational Readiness: this field indicates the service's readiness level. This information is shown as a “tag” like in the service window both in the marketplace and the service dedicated page. The service provider must choose one of the following levels:

oalpha version (new features are added),

obeta version (only bug fixing),

oops version (standard version),

olong-term supported version (whenever bug fixing is ensured))

4.2.2 Interfaces attributes

Each service can expose one or more interfaces. Service interfaces are visualized as tags on the dedicated service web portal page. Service tags will reflect the type of the interface. For each of the interfaces offered by the service, the service provider must fill in the following information:

•Name: this is the interface's name and can be an acronym. If present, the name is used as the title in the documentation section.

•Version: the version of the interface. Version is used in the documentation section for traceability.

•Documentation: the API documentation must be provided in Swagger standard format, and the service provider must provide the link to the deployed Swagger. Any other interface (not API) documentation shall be provided as a link within the Readthedocs documentation.

•Type: interface endpoint archetype (CLI, GUI, API, etc). To keep this attribute uniform among services, a version control is foreseen. The interface type is used as a keyword, i.e. it is a queryable item in the web portal.

•User Availability: identifies the availability of the interface to users according to free or restricted access. It is used on the web portal to filter the service usage depending on the user's role.

•Support Contact: it can be inserted whenever the interface contact differs from the service contact. This field is not visualized in the web portal.

•Interface URL: this field provides the URI of the interface root path as exposed to the end user, depending on the service integration.

4.2.3 Data Portfolio attributes

The Service Data Portfolio is a curated collection of data and related information, included in the Service offering.

The data portfolio metadata is addressed in a dedicated section (see Section 4.2.4) that details each component necessary to describe the envisioned vastness of data fully.

The service portfolio is tracked through the service registry, which not only encapsulates the data itself but also the following essential metadata:

•Data Portfolio: Detailed information about the service data offer, that is separately addressed in Section 4.2.4. If a service does not expose any data, this attribute is null.

•Revision version: This metadata represents the version number of the service portfolio, which is incremented with each update or change.

•Publication Date: Marks the date when the portfolio was initially published.

•Last Update Date: Denotes the date on which the most recent portfolio revision was made.

Together, these attributes provide a comprehensive tracking mechanism for the evolution and accessibility of the service portfolio, ensuring that users are constantly interacting with the most current datasets provided by the platform's services.

4.2.4 Tenancy Resources attributes

Services can expose various tenancy resources. These are resources allocated to or associated with a specific tenant (i.e., a user, project, or Service Provider). Tenancy resources are essential for defining what a tenant is entitled to use or access within the service environment, and they enable fine-grained management of consumption, prices, and entitlements.

Tenancy resources typically represent units of value or service functionality that are provisioned and monitored on a per-tenant basis. They are fundamental in supporting billing, quota enforcement, access control, and service-level differentiation.

Each tenancy resource is described using the attributes as follows:

•Resource Name: The name of the resource provided by the Service Provider within the scope of the service.

•Resource Description: A brief description outlining the purpose and details of the resource.

•Terms & Conditions: A URI linking to the applicable terms and conditions for the resource.

•Monthly Allocation Price: The monthly price of the resource, expressed in euros (EUR), with a value greater than or equal to 0.

•Resource Type: The category of the resource, selected from the following options: Quota, Datasets, or Service Feature.

•Free DestinE: A boolean flag indicating whether the resource is included in the Free DestinE tier. If set to true, the associated price is considered zero.

Also, the resource is characterized by the following properties describing the time availability within the Service Offering.

•Quantity max: integer number indicating the max time for which the resource is available

•Quantity min: integer number indicating the min time for which the resource is available

•Step Value: integer number indicating the configurable timestep of the resource availability

•Default Value: integer number defining the min/max time average (or min time) in case of not specified by the values above

•Unit: enum list specifying if the availability time is hours, days, or months

•Updatable: boolean value indicating if the resource time range span can be updated or not (default True)

Here below some more specifications in case the Tenancy Resource is Dynamic.

4.2.4.1 Tenancy Dynamic Resource attributes

Each tenancy resource can be defined dynamically according to the properties listed below. The data model is designed to be resilient, without enforcing strict specifications on the resource type, which is instead determined by its parent attributes. Tenancy dynamic resources are called metrics.

•Metric Name: the name of the metric

•Quantity Max: integer number for the max quantity associated with the metric

•Quantity Min: integer number for the min quantity associated with the metric

•Step Value: integer number used as step to configure the metric dynamically in a range

•Default Value: integer number defining the min/max average (or min) in case of not specified by the values above

•Unit: free text specification of the metric unit used (e.g. GB)

•Updatable: boolean value indicating if the metric range span can be updated or not (default True)

4.3 Data Portfolio

The Data Portfolio is a curated collection of data and related information composing a Service offering.

As previously discussed for the Service Registry, also the Data Portfolio is divided into three types of attributes as follows.

1.General attributes represent the hierarchy of data in a top-down approach across the following levels. General attributes are defined in Section 4.3.1

Program > Program Subset > Collection > Dataset > Variable

2.Spatial attributes define the spatial characteristics related to a Dataset. They are discussed in Section 4.3.2.

3.Time attributes define the temporal characteristics related to a Dataset. These attributes are discussed in Section 4.3.3.

4.3.1 General attributes

General attributes are presented here in a top-down perspective following the levels in Figure 2.

Figure 2: Top-down perspective regarding the Data hierarchy.

4.3.1.1 Program

The Program defines the highest level of the Data Portfolio. It represents also the main entry grouping data within a Service offering. The program is referred to a scientific or politic initiative founded by a public Entity or Organization such as the European Union. Examples of programs are Copernicus Information or Destination Earth.

A program presents the following attributes.

•Name: the name of the Program. It must be defined by the service provider following the naming convention. The name attribute is used on the web portal (on both the service dedicated page, the service marketplace, and the data marketplace) and can be searched by the user.

•Description: the description of the Program is automatically mapped to a standard one if predefined (indicated by an automatized label; see 1.7). The description attribute is used on the web portal (in the data marketplace) and can be searched by the user.

•References: it is a URL automatically mapped to a standard one. It appears on the web portal within the data marketplace.

•Image: cover image representing the Program. It appears on the web portal within the data marketplace.

4.3.1.2 Program Subset

The Program Subset specifies the missions or services within a Program.

•Name: the name of the Subset must be defined by the service provider following the naming convention. The name attribute is used on the web portal (in both the service dedicated page and the data marketplace) and can be searched by the user.

•Description: the description of the Subset is automatically mapped to a standard one if predefined. The description attribute is used on the web portal (in the data marketplace) and can be searched by the user.

•References: it is a URL automatically mapped to a standard one. It appears on the web portal within the data marketplace.

•Image: cover image representing the Subset. It appears on the web portal within the data marketplace.

4.3.1.3 Collection

The Collection represents a high-level criterion for grouping data. Collections can correspond to applications or missions or models/simulations or services.

•Name: the name of the Collection. It must be defined by the service provider following the naming convention. The name attribute is used on the web portal (in both the service dedicated page and the data) and can be searched by the user.

•Description: the description of the Collection is automatically mapped to a standard one if predefined. The description attribute is used on the web portal (in the data marketplace) and can be searched by the user.

•References: it is a URL automatically mapped to a standard one. It appears on the web portal within the data marketplace.

•Image: cover image representing the Collection. It appears on the web portal within the data marketplace.

4.3.1.4 Dataset

A Dataset characterizes a given collection and expand it, at the same time grouping data into scenarios, stories or modality of observations. A dataset presents several attributes that mainly characterize the service’s own data portfolio.

•Dataset Name: the name of the Dataset. It must be defined by the service provider following the naming convention. The name attribute is used on the web portal (in both the service dedicated page and the data marketplace) and can be searched by the user. Note that if the processing over the dataset variables is such that the original dataset name is not more applicable, the new name must be created using the naming convention mentioned above.

•Dataset ID: the dataset's short name, reflecting how the service identifies it in its interfaces. The Dataset ID attribute could be used on the web portal within the service dedicated page. It is strongly recommended to use known ID, e.g. used by primary data sources or within the literature.

•Dataset Description of usage: description regarding the usage of the dataset, describing the modifications (i.e. processing steps) that the service operated on the dataset. A list of possible processing steps must be declared within this description. This metadata is exposed on the web portal in the service dedicated page, specifically in the data portfolio section.

•Reference: the reference to the original model data. It can be a DOI, a URL or a reference. It is used only if necessary for the dissemination. It is not exposed on the web portal but is used for traceability.

•Update Frequency: High-level indication of how often the service updates the dataset. It is not visible on the web portal but is used for traceability.

•Processing Label: It is a Boolean field used to indicate whether the dataset has been modified with respect to the origin source data.

•Format: The data format in which datasets are available, like “GeoTIFF,” “JPEG2000,” “NetCDF,” or “HDF5.” This field should be standardized to help users understand the data compatibility with different software and tools. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention only if the service doesn’t allow data downloading.

•User Availability: Identifies the availability of the dataset to users according to free or restricted access. It is used on the web portal to filter the service usage depending on the user’s role.

•Image: cover image representing the Collection (inherited from the Collection). It appears on the web portal within the data marketplace.

•Dataset Source: The following metadata are helpful in identifying the origin of the dataset and better understanding how the service has modified it. Hereafter is the list of attributes that are necessary to fully determine the data source:

oDataset Owner: entity or organization or service owning the dataset. This metadata is indexed for keyword searches on the web portal.

oDataset Provider: the name of the service that provides the dataset. It can be another DESTINE PLATFORM service. It is not visible on the web portal but is used for traceability.

oDataset Source Description: the description of the Dataset is automatically mapped to a standard one if predefined and applicable (the list of predefined Dataset coincides with the Dataset). The description attribute is used on the web portal (in the data marketplace) and can be searched by the user.

oDataset Source Access Point: URL to the data access endpoint, used only for traceability purposes.

oDataset Source License: license for dataset dissemination at the data source. It is automatically mapped to a standard one if predefined. This is used only for traceability purposes.

oSource Dataset Name: original dataset name as disseminated by the data source. This is used for traceability purposes. This metadata is used on the web portal (in the data marketplace) and can be searched by the user.

•Dataset Type: Defines the dataset type in terms of its origin, context, observation, or type of measurement. The predefined list of dataset types is here below. If not predefined, the service provider can register the type based on version control. It is worth noting that each dataset type is associated with a set of tailored metadata relevant to the current type.

oSatellite observations: Satellite observation data encompasses a wide range of information collected by satellites orbiting the Earth, including images, measurements, and other data regarding the Earth’s atmosphere, land, oceans, and space environment. These data are obtained through various instruments and sensors mounted on satellites, capable of capturing visible light imagery, thermal images, microwave, and radar data, among others. Hereafter the required metadata for this kind of dataset:

▪Platform name: the name of the satellite platform. This field is set to “suggested” as it may be helpful when the platform name differs from the name used in the data description hierarchy. This metadata is indexed for keyword searches on the web portal.

▪Processing level: the processing level with respect to the raw data. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention. Try to follow the convention Level-xx.

▪Sensor name: The name of the satellite sensor of the original data. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention.

▪Sensor type: the sensor type of the satellite. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention.

▪Projection type: It refers to the method used to represent the Earth’s three-dimensional surface on a two-dimensional map or image by the service. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention.

▪CRS: The Coordinate Reference System used by the service to offer the dataset. A widely used example is the Geographic Coordinate System (GCS), which uses latitude and longitude to define locations on the Earth. This metadata is indexed for keyword searches on the web portal. It can be omitted using the NA convention.

oComputational model: Computational model data refers to data generated from simulations run on computer models that represent physical, biological, chemical, or other types of systems using mathematical formulations. These models are used to study and predict complex phenomena that are difficult or impossible to observe directly. Hereafter the required metadata for this kind of dataset:

▪Model name: The name of the model from which the data originates as indicated by the source data owner/provider. This metadata is indexed for keyword searches on the web portal.

▪Model type: This field identifies which model has been used to obtain the dataset. Examples are Reanalysis, Simulation, Hybrid, Machine Learning, Artificial Intelligence, Digital Twin, Numerical, and Physics. This metadata is indexed for keyword searches on the web portal.

▪Model version: The model version the service uses to retrieve the dataset data. It is not used in the web portal but is used for traceability.

▪Last Update: The date of the last update of the model is as per source information. It is not used in the web portal but is used for traceability.

oIn Situ measurements: In-situ measurement refers to the direct observation and collection of data from the actual location of the phenomena or variables of interest. This method involves the physical presence of instrumentation or observers at the investigation site to measure environmental, atmospheric, or other parameters in real time. Hereafter is the required metadata for this kind of dataset:

▪Measurement Instrument: Specific tools or instruments used for the data collection, such as “Thermometer,” “Barometer,” and “Soil Moisture Probe.” This metadata is indexed for keyword searches on the web portal.

▪Collection Method: Techniques employed in the data gathering process, such as “Manual Sampling,” “Automated Weather Station,” and “Water Quality Sonde.” This metadata is indexed for keyword searches on the web portal.

oNon-satellite remote sensing observations: Non-satellite remote sensing observation data is collected through sensors or instruments not mounted on satellites. These sensors may be positioned on aircraft, drones, balloons, or ground-based platforms, enabling the acquisition of detailed data from the Earth’s surface and atmosphere. Hereafter is the required metadata for this kind of dataset:

▪Data Sensor: Name or type of the sensor used for data collection, such as “LIDAR,” “Hyperspectral Imager,” or “Multispectral Camera,” as well as specific sensor model where applicable. The field is searchable by the user on the web portal, so it should be standardized to guarantee an efficient user experience in discovery.

▪Data source: The platform or vehicle used for carrying the sensor, for example, “Unmanned Aerial Vehicle (Drone),” “Fixed-wing Aircraft,” “Helicopter,” or “Ground Station.” The field is searchable by the user on the web portal, so it should be standardized to guarantee an efficient user experience in discovery.

▪Processing level: The stage of data processing provided, for instance, “Raw Data,” “Georeferenced Data,” “Atmospherically Corrected Data,” or “Derived Products.” This metadata is indexed for keyword searches on the web portal.

▪Derived product: Information about any data products derived from the original observations, like “Vegetation Indices,” “Digital Elevation Models (DEMs),” or “Thermal Maps.” This metadata is indexed for keyword searches on the web portal.

oStatistical Analysis: It encompasses diverse information collected from surveys, censuses, administrative records, experiments, or other data collection methods. Typically, it refers to numerical information collected, processed, and analysed from various sources to draw meaningful insights and make informed decisions. Hereafter, the required metadata for this kind of dataset:

▪Revision Date: the last update date. It is not exposed on the web portal but is used for traceability.

▪Resource Type: The origin of the raw data utilized in the analysis, which could be “Census Data,” “Survey Results,” or “Experimental Data.” This metadata is indexed for keyword searches on the web portal.

▪Language: The language of the dataset. It is not exposed on the web portal but is used for traceability.

▪Classification system: Refers to the level of granularity in the territorial or categorical breakdown of the data. Examples are “Regional,” “National,” “Municipal,” or “Demographic Groups.” This metadata is indexed for keyword searches on the web portal.

oIoT: The Internet of Things refers to a network of interconnected devices embedded with sensors, software, and other technologies that enable them to collect and exchange data over the internet. Hereafter the required metadata for this kind of dataset:

▪Device Identification: Unique identifiers for each IoT device to differentiate and track them within the network. It is not exposed on the web portal but is used for traceability.

▪Sensor: It describes the type of sensor. This metadata is indexed for keyword searches on the web portal.

oGeospatial Models: This dataset type includes models representing physical objects, structures, terrain, or environments, as well as geospatial data such as maps, satellite imagery, LiDAR point clouds, geographic information system (GIS) datasets, shapefiles, or 3D models. These data sources are essential for applications like augmented reality (AR), virtual reality (VR), geospatial analysis, urban planning, and facility management. Hereafter the required metadata for this kind of dataset:

▪Model Type: Description of the model format, structure, and representation (e.g., mesh, point cloud, raster, vector). This metadata is indexed for keyword searches on the web portal.

▪Model Complexity: Level of detail (LOD), complexity, and scale of the model representation, including hierarchical representations for multi-resolution models. This metadata is indexed for keyword searches on the web portal.

▪Dependencies or Linked Data: Information about dependencies, associated datasets, or linked data sources relevant to the model. This metadata is indexed for keyword searches on the web portal.

oCustom Data Sources: This category encompasses any proprietary or custom data sources specific to the IoT application or industry. It could include data from legacy systems, proprietary protocols, unconventional sensors, specialized databases, or social media feeds.

•Spatial Properties: Geographic data has different characteristics called spatial properties. Spatial coverage shows how the Earth system is represented in the data, encompassing the extent and distribution of geographical features. It is worth noting that the data can be defined on a single level or multiple levels, influencing the attributes needed to fully characterize the dataset. Spatial properties are detailed in Section 4.3.2.

•Temporal Properties: Time-related attributes. Temporal coverage delineates the period during which observations or measurements were taken. This aspect provides crucial context for understanding the dataset’s temporal scope. Temporal properties are detailed in Section 4.3.3

Figure 3: Database schema for the section of Dataset types.

4.3.1.5 Variable

A Variable constitutes the fundamental unit of data within the model, representing the most granular level of information. It encompasses individual measurements or observations and can include various data types, such as numerical values, categorical attributes, or timestamps. Each variable holds distinct properties, including metadata like units of measurement and tags, which are essential to understanding the applicability of the data itself. Thanks to tags, variables may exhibit relationships with one another, forming complex data structures that underpin the model’s functionality. The attributes required for the variable entity are:

•Name: the full name of the variable. This metadata is indexed for keyword searches on the web portal.

•Unit: the unit of measure of the variable. This parameter can be a list if the variable is provided in more than one unit. This metadata is indexed for keyword searches on the web portal. Use the NA convention if a unit is not available.

•Tag: Each variable can be associated with one or more tags. This metadata is indexed for keyword searches on the web portal and, thanks to the data categories to which each tag is related, also to group services in the service marketplace.

4.3.2 Spatial attributes

These spatial properties serve as critical descriptors, enabling users to effectively discover, interpret, and utilize the datasets available on the DestinE platform. To ensure comprehensive and standardized data representation, metadata about essential spatial properties during the registration process must be captured. This paragraph outlines the spatial properties metadata required from service providers, providing guidance on how to accurately document and convey their dataset’s spatial characteristics. By adhering to these guidelines, service providers can enhance the discoverability and utility of their datasets, ultimately fostering greater collaboration and innovation within the community.

The picture below represents the spatial attributes as described in the data model.

Figure 4: Dataset Spatial Attributes

The main spatial attributes are detailed in the following:

•Spatial extent: This describes the geographical extent of the dataset, indicating the area covered on the Earth’s surface. It provides information on the geographic boundaries and limits of the data. Only one of the following options can be applied to the dataset. Some options include all underlying levels of territorial coverage, i.e. it implicitly encompasses all subordinate or smaller geographic, administrative, or conceptual areas. Other options are increasingly nested with the level of detail regarding the geographic coverage. All keys are pre-inserted in the database, so the service provider must choose from a list of defined spatial extents. This metadata is indexed for keyword searches on the web portal. The spatial extent has the following granularity specified in the data model.

oGlobal: Selecting this option is like selecting all the following. Otherwise, choose the one below that applies to the dataset.

oContinent: by selecting All, the below are automatically all selected. Otherwise, the service provider has to mark the ones that apply:

▪All (otherwise one or more from the list below)

▪Africa, Europe, Asia, Oceania (including Australia and the Pacific Islands), North America, South America

oOcean: by selecting All, the below are automatically all selected. Otherwise, the service provider has to mark the ones that apply:

▪All (otherwise one or more from the list below)

▪Pacific Ocean, Atlantic Ocean, Indian Ocean, Southern Ocean, Arctic Ocean

oSea: by selecting All, the below are automatically all selected. Otherwise, the service provider has to mark the ones that apply:

▪All (otherwise one or more from the list below)

▪Mediterranean Sea, Caribbean Sea, South China Sea, Bering Sea, Gulf of Mexico, Arabian Sea, Sea of Okhotsk, Sea of Japan (East Sea), North Sea, Red Sea, Baltic Sea, Black Sea, Andaman Sea, East China Sea, Hudson Bay

oNUTS: by selecting All, the below are automatically all selected. Note that if Europe is selected as a continent, as well as the global option, all NUTS are included. Otherwise, one of the following options applies:

▪All (otherwise specify the level)

▪NUTS Level (All_NUTS_LEVL are already mapped in the database). Based on the NUTS Level, the list can be filtered according to

♦NUTS_ID: (All_NUTS_ID per level must be mapped in the database)

♦NUTS_NAME (All_NUTS_NAME per level must be mapped in the database)

♦All, if all, NUTS at that level are included.

oIn Land Water: Inland waters reference.

oCustom: if any of the above apply. This is a free-text input.

•Resolution: If applicable, the data resolution is expressed in metres or degrees.

•Resolution tag: The resolution tag is automatically detected when the provider inserts the resolution. It can be only one value per dataset. This metadata is indexed for keyword searches on the web portal. The limits defined for the resolution labels are given in metres units below.

oLow: defined for resolution ≥40,000 metres (greater than 0.4 degrees)

oMedium: defined for 10,000 metres ≤ resolution ≤ 39,000 metres (between 0.1 and 0.39 degrees)

oHigh: defined for 1,000 metres ≤ resolution ≤ 9,000 metres (between 0.01 and 0.09 degrees)

oSuper-high: defined for resolution < 1,000 metres (minor than 0.01 degrees)

oNA (not applicable)

•Spatial type: it represents the spatial representation of a dataset. The service provider must select Single-level or multi-level or not applicable (NA) if the dataset does not have a spatial type. Depending on this type, one of the following metadata must be filled:

oSingle-level metadata: it describes the specific coverage type of the data. The service provider can choose among a list of predefined which includes surface, 2m, 10m, 100m, total column, tropospheric total column, tropospheric columns, top of canopy, top of atmosphere, bottom of atmosphere, single level. This metadata is indexed for keyword searches on the web portal.

oMulti-level metadata: it describes the type of profile the dataset offers. For multi-level data, the attributes are the following:

▪Unit: the unit of measure of the vertical level. This metadata is indexed for keyword searches on the web portal.

▪Vertical dimension: the name of the vertical dimension of the data profile. The predefined list includes pressure level, model level, depth, height level, and vertical column.

▪Resolution: the resolution of the vertical level. For instance, for pressure levels: 1000, 850, 700, 500, 300 hPa

▪NA (not applicable)

•Spatial Representation: This describes the level of readiness to representation on a map (e.g., mesh, point cloud, raster, vector). This metadata is indexed for keyword searches on the web portal. The initial list of predefined values is reported in the following:

oVector: Vector data represents geographic features as discrete geometric shapes such as points, lines, and polygons.

oMesh: A mesh is a collection of vertices, edges, and faces that define the shape of a three-dimensional object. In the context of spatial data, a mesh may represent terrain surfaces, 3D models of buildings or infrastructure, or other complex geometric structures.

oRaster: Raster data represents spatial information as a grid of cells or pixels, where each cell holds a single value or a set of values representing a particular attribute.

oPoint Cloud: A point cloud collects data points in a three-dimensional coordinate system. Point clouds are often generated using remote sensing techniques like LiDAR for creating digital surface models, 3D reconstructions, or analyzing terrain.

oDelimited Text: Delimited text refers to tabular data where each row represents a record and columns represent attributes. Standard delimiters include commas (CSV), tabs (TSV), or other characters that separate fields.

oVector Tile: Vector tiles are a method of encoding and delivering vector data in small, pre-rendered tiles for efficient rendering and display on web maps. They contain geometric and attribute data for map features and are optimized for fast loading and rendering in web mapping applications.

4.3.3 Temporal attributes

Temporal properties are equally crucial in facilitating comprehensive and standardized data representation on the DestinE platform. Just as spatial properties enable users to discover, interpret, and utilize datasets effectively, capturing essential temporal properties metadata during registration is necessary. This paragraph delineates the temporal properties metadata required from service providers, offering guidance on accurately documenting and conveying the temporal characteristics of their datasets. By adhering to these guidelines, service providers can augment the discoverability and utility of their datasets, thereby promoting enhanced collaboration and innovation within the community.

The picture below represents the temporal attributes as described in the data model.

Figure 5: Database schema for the set of temporal properties.

The main temporal attributes are detailed in the following:

•Start date: This is the start date of the dataset as provided by the service, formatted as “datetime +/- N days” where datetime is selected from the calendar widget as YYYY-MM-DD:HH:MM:SS and defined in UTM. N can be 0 if the start date is fixed. It can be set to NA if not applicable. This metadata is not exposed on the web portal but is used for traceability.

•End date: This is the end date of the dataset as provided by the service, formatted as “datetime +/- N days”, where datetime is selected from the calendar widget as YYYY-MM-DD:HH:MM:SS and defined in UTM. N can be 0 if the start date is fixed. It can be set to NA if not applicable. This metadata is not exposed on the web portal but is used for traceability.

•Time resolution: the time resolution of the data. It can be set to NA if not applicable. This metadata is indexed for keyword searches on the web portal.

•Persistency on service: the time availability of the dataset according to rules of dataset rolling defined by the service provider. It is not visible on the web portal but is used for traceability. It will probably be included as information in the web portal.

•Update frequency: temporal information about the frequency of update of the dataset according to the service ingestion process. Probably, it will be included as information in the web portal.

•Last update: temporal information about the latest update of the dataset according to the service ingestion process.

•Time label: temporal information about the temporal extent expressed as past, present, and forecast.

5. Keywords: Categories & Tags

Incorporating keywords such as tags and categories is imperative to enhance user experience and facilitate efficient search functionalities within the platform hosting services. These elements organize the vast array of services and data, enabling users to locate relevant information and make informed decisions. Consequently, there are distinct sets of tags for service categories and data categories.

Figure 6: Keywords schema.

The comprehensive list of Tags is provided as a Guideline in Annex 1.

Annex 1. DestinE Platform – Guidelines

Please refer to “Annex1 DEST-SRCO-TN-2400459 DestinE Platform - Guidelines”.

Compare Documents