Data Product Descriptor Specification⚓︎
Version 1.0.0 (DRAFT)⚓︎
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.
This document is licensed under The Apache License, Version 2.0.
Disclaimer⚓︎
Part of this content has been taken from the great work done by the folks at theOpenAPI Initiative and AsyncAPI Initiative. We have decided to not reinvent the wheel and inspire our work to these two specifications mainly for the following reasons:
- We think that the work made by OpenAPI Initiative and AsyncAPI Initiative is great :)
- We want to make the learning curve for the Data Product Descriptor Specification as smooth as possible, aligning its definition to the one of other two popular specifications in the software and data engineers community
- We think that OpenAPI and AsyncAPI are natural specifications for defining the interface of data product's ports that expose an API endpoint. This specification does not impose the use of any specific standard for the port's interface definition but these two are highly recommended.
Introduction⚓︎
The Data Product Descriptor Specification (DPDS) defines a declarative and technology-independent standard to describe a data product in all its components. It allows human agents (e.g. analysts, data scientists, etc..) and digital agents (e.g. other data products, BI tools, planes of the underlying data mesh ops platform, etc..) to operate, discover and access a data product. When properly defined, an external agent can understand and interact with the data product with a minimal amount of cognitive load and implementation logic.
The formalization of a standard data product descriptor document through an open specification is useful to enable the implementation of an ecosystem of interoperable data mesh tools. The following is a non-exhaustive list of tools that can benefit from the specification:
- catalogs (search, document and collaborate)
- design tools (create new products by the composition of reusable templates)
- lifecycle management tools (deploy and operate)
- access management tools (assign/track access grants and generate client code in different languages)
- policies checking tools (enforce standard compliance and audit security)
- observability tools (monitor and detect)
- data lineage tools (trace data flows and perform forward/backward analysis)
- mesh topology analysis tools (calculate value/trust scores and detect structural problems)
- semantic tools (apply ontologies over mesh topology)
- domain specific language tools (create a collection of interconnected data products that implement together a specific value stream)
Table of Contents⚓︎
- Definitions
- Specification
- Versions
- Format
- Document Structure
- Data Types
- Rich Text Formatting
- Relative References In URLs
- Schema
- Data Product Descriptor Entity
- Info Object
- Owner Object
- Contact Point Object
- Interface Components Object
- Input Port Component
- Output Port Component
- Discovery Port Component
- Observability Port Component
- Control Port Component
- Promises Object
- Expectations Object
- Contracts Object
- Internal Components Object
- Lifecycle Task Info Object
- Application Component
- Infrastructural Component
- Components Object
- Reference Object
- External Resource Object
- Standard Definition Component
- Specification Extension Point
- Specification Extensions
- Appendix A: Revision History
Definitions⚓︎
Standard⚓︎
The set of shared rules used by different agents to describe an entity or process of common interest. The agents that follow the standard limit their autonomy by conforming to the set of shared rules to facilitate cooperation between them through interoperability.
Standard Specification⚓︎
The formal description of the rules that form a standard. A standard can have multiple specification versions associated with it. Sometimes the words standard and specification are used as synonymous.
Standard Definition⚓︎
The description of one specific entity or process created using and conforming to the set of rules formally described in the standard specification
Data Product⚓︎
The smallest unit that can be independently deployed and managed in a data architecture (i.e. architectural quantum). It is composed of all the structural components that it requires to do its function: metadata, data, code, policies that govern the data and its dependencies to infrastructure. Each data product has a clear identifier, a version number and an owner.
Data Product Ports⚓︎
The interfaces exposed to external agents by a data product. Each port exposes a service or set of correlated services. These are the five types of ports supported by a data product:
- Input port(s): an input port describes a set of services exposed by a data product to collect its source data and makes it available for further internal transformation. An input port can receive data from one or more upstream sources in a push (i.e. asynchronous subscription) or pop mode (i.e. synchronous query). Each data product may have one or more input ports.
- Output port(s): an output port describes a set of services exposed by a data product to share the generated data in a way that can be understood and trusted. Each data product may have one or more output ports.
- Discovery port(s): a discovery port describes a set of services exposed by a data product to provide information about its static role in the overall architecture like purpose, structure, location, etc. Each data product may have one or multiple discovery ports.
- Observability port(s): an observability port describes a set of services exposed by a data product to provide information about its dynamic behavior in the overall architecture like logs, traces, audit trails, metrics, etc. Each data product may have one or more observability ports.
- Control port(s): a control port describes a set of services exposed by a data product to configure local policies or perform highly privileged governance operations. Each data product may have one or more control ports.
The data product descriptor specification uses the following concepts of promises theory to formally describe the set of services exposed by each port regardless of the specific type:
- Promises: Through promises, the data product declares the intent of the port. Promises are not a guarantee of the outcome but the data product will behave accordingly to them to realize its intent. The more a data product keeps its promises over time and the more trustworthy it is. Thus, the more trustworthy a data product is the more potential consumers are likely to use it. Trust is based on the verification of how good a data product was in the past in keeping its promises. This verification should be automated by the underlying platform and synthesized in a trust score shared with all potential consumers. Examples of promises are descriptions of services, API, SLO, deprecation policy, etc.
- Expectations:** Through expectations, the data product declares how it wants the port to be used by its consumers. Expectations are the inverse of promises. They are a way to explicitly state what promises the data product would like consumers to make regarding how they will use the port. Examples of expectations are intended usage, intended audience, etc.
- Contracts:** Through contracts, the data product declares promises and expectations that must be respected by the data product and its consumers. A contract is an explicit agreement between the data product and its consumers. It is used to group all the promises and expectations that if not respected can generate penalties like monetary sanctions or interruption of service. Examples of contracts are terms of conditions, SLA, billing policy, etc.
The governance can use these concepts to standardize the definition of these interfaces across all data products, while the platform can use them to provide the mechanisms to implement the described services in a compliant way.
Data Product Application Components⚓︎
The components of a data product that implement the services exposed through its ports (i.e. pipelines, microservices, etc..).
Data Product Infrastructural Components⚓︎
The components of a data product related to the infrastructural resources (i.e. storage, computing, etc..) used to run its application components.
Data Product Descriptor Document⚓︎
The document (or set of documents) that contains the standard definition of a data product created using and conforming to the Data Product Descriptor Specification.
Data Product Descriptor Specification⚓︎
The formal description of the rules to follow to create a standard-compliant Data Product Descriptor Document.
Specification⚓︎
Versions⚓︎
The Data Product Descriptor Specification is versioned using Semantic Versioning 2.0.0 (semver) and follows the semver specification.
The major
.minor
portion of the semver (for example 1.0
) SHALL designate the DPDS feature set. Typically, .patch
versions address errors in this document, not the feature set. Tooling which supports DPDS 1.0 SHOULD be compatible with all DPDS 1.0.* versions. The patch version SHOULD NOT be considered by tooling, making any distinction between 1.0.0
and 1.0.1
for example.
Each new minor version of the Data Product Descriptor Specification SHALL allow any Product Descriptor document that is valid against any previous minor version of the Specification, within the same major version, to be updated to the new Specification version with equivalent semantics. Such an update MUST only require changing the dataProductDescriptor
property to the new minor version.
For example, a valid Data Product Descriptor 1.0.2 document, upon changing its dataProductDescriptor
property to 1.1.0
, SHALL be a valid Data Product Descriptor 1.1.0 document, semantically equivalent to the original Data Product Descriptor 1.0.2 document. New minor versions of the Data Product Descriptor Specification MUST be written to ensure this form of backward compatibility.
Format⚓︎
A Data Product Descriptor Document that conforms to the Data Product Descriptor Specification is itself a JSON object, which may be represented either in JSON or YAML format.
For example, if a field has an array value, the JSON array representation will be used:
All field names in the specification are case-sensitive. This includes all fields that are used as keys in a map, except where explicitly noted that keys are case-insensitive.The schema exposes two types of fields: Fixed fields, which have a declared name, and Patterned fields, which declare a regex pattern for the field name.
Patterned fields MUST have unique names within the containing object. To preserve the ability to round-trip between YAML and JSON formats, YAML version 1.2 is RECOMMENDED along with some additional constraints:
- Tags MUST be limited to those allowed by the JSON Schema ruleset.
- Keys used in YAML maps MUST be limited to a scalar string, as defined by the YAML Failsafe schema ruleset.
Document Structure⚓︎
A Data Product Descriptor Document MAY be made up of a single document or be divided into multiple, connected parts at the discretion of the user. In the latter case, a Reference Object
is used.
It is RECOMMENDED that the root Data Product Descriptor Document be named: data-product-descriptor.json
or data-product-descriptor.yaml
.
Object Types⚓︎
A [Data Product Descriptor Document has one and only one root object. The properties of an object are described by its fields. A field type can be another object or a primitive type. An addressable and versioned object is called entity. The root object of the Data Product Descriptor Document is an entity object. Other entities that exist only in the scope of the root entity are called components.
Data Types⚓︎
Primitive data types in the DPDS are based on the types supported by the JSON Schema Specification.
Primitives have an optional modifier property: format
.
DPDS uses several known formats to define in fine detail the data type being used.
However, to support documentation needs, the format
property is an open string
-valued property and can have any value.
Formats such as "email"
, "uuid"
, and so on, MAY be used even though undefined by this specification.
Types that are not accompanied by a format
property follow the type definition in the JSON Schema. Tools that do not recognize a specific format
MAY default back to the type
alone as if the format
is not specified.
The formats defined by the DPDS are:
type |
format |
Comments |
---|---|---|
integer |
int32 |
signed 32 bits |
integer |
int64 |
signed 64 bits (a.k.a long) |
number |
float |
|
number |
double |
|
string |
||
string |
alphanumeric |
a string that match the following regex ^[a-zA-Z0-9]+$ |
string |
name |
a string that match the following regex ^[a-zA-Z][a-zA-Z0-9]+$ |
string |
fqn |
a string that match the following regex ^[a-zA-Z][a-zA-Z0-9.:]+$ |
string |
version |
a string that match the following regex ^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$ |
string |
byte |
base64 encoded characters |
string |
binary |
any sequence of octets |
string |
uuid |
a sequence of 16 octets as defined by RFC4122 |
boolean |
||
string |
date |
As defined by full-date - RFC3339 |
string |
date-time |
As defined by date-time - RFC3339 |
string |
password |
A hint to UIs to obscure input. |
Rich Text Formatting⚓︎
Throughout the specification description
fields are noted as supporting CommonMark markdown formatting.
Where Data Product Descriptor tooling renders rich text it MUST support, at a minimum, markdown syntax as described by CommonMark 0.27. Tooling MAY choose to ignore some CommonMark features to address security concerns.
Relative References in URLs⚓︎
Unless specified otherwise, all properties that are URLs SHOULD be absolute references. If a property explicitly specifies in its description that allows a relative reference its value MUST be compliant with RFC3986. Relative references MUST be resolved using the URLs defined in the property description as a Base URI.
Relative references used in $ref
are processed as per JSON Reference, using the URL of the current document as the base URI. See also the Reference Object.
Schema⚓︎
In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL.
Data Product Descriptor Entity⚓︎
This is the root object of the Data Product Descriptor Document.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
dataProductDescriptor | string:version |
(REQUIRED) The semantic version number of the Data Product Descriptor Specification Version that the Data Product Descriptor Document uses. The dataProductDescriptor field SHOULD be used by tooling specifications and clients to interpret the Data Product Descriptor Document. This is not related to the data product's info.version field. |
info | Info Object | (REQUIRED) The general information about the data product. The information can be used by the platform or by consumers if needed. |
interfaceComponents | Interface Components Object | (REQUIRED) The list of all interface entities exposed by the data product. |
internalComponents | Internal Components Object | The list of all internal entities that compose the data product. |
components | Components Object | An element to hold a set of reusable objects that can be referentiated in other part of the document. |
tags | [string ] |
A list of tags associated to the data product. Tags can be used for logical grouping of data products. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Info Object⚓︎
The Info Object
contains general information about the data product. The information can be used by the platform or by consumers if needed.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (RFC-4122) generated server side during data product creation as SHA-1 hash of the fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the data product. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling an API different from the ones exposed by the data product experience plane the fullyQualifiedName MUST be always used. Example: "id: "2b172838-73b1-5d6c-be45-cc75aee180a0" |
fullyQualifiedName | string:fqn |
(READONLY) The unique universal idetifier of the data product. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version} . It's RECOMMENDED to use as mesh-namespace your company's domain name in reverse dot notation (es it.quantyca ) in order to ensure that the fullyQualifiedName is unique universal idetifier as REQUIRED. To the mesh-nemaspace MAY be added as postfix the product's 'domain' (es. planning , operations , ...). Using the data product's domain as postfix in the mesh-namespace is anyway NOT RECOMMENDED. Example: "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It MUST be a constant value equals to dataproduct . |
name | string:name |
(REQUIRED) The name of the data product. MUST be unique within the mesh-namespace . It's RECOMMENDED to use a camel case formatted string. |
version | string:version |
(REQUIRED) The semantic version number of the data product definition contained in the given Data Product Descriptor Document. Everytime the major version of one of the data product's ports changes also the major version of the product MUST be incremented. It is RECOMMENDED to use 0 as major version for data products that are not yet general available. These data products can introduce breaking changes without incrementing their major version. It is anyway RECOMMENDED that for every breaking change introduced by a data product that is not yet general available (i.e. major version equals to 0) at least the minor version is incremented. This field is not related to the dataProductDescriptor field. |
displayName | string |
The human readable name of the data product. It SHOULD be used by frontend tool to visualize data product's name in place of the name property. It's RECOMMENDED to not use the same displayName for different data products belonging to the same mesh-namespace . |
description | string |
The data product description. CommonMark syntax MAY be used for rich text representation. |
domain | string:name |
(REQUIRED) The domain to which the data product belongs to. |
owner | Owner Object | (REQUIRED) A collection of information related to the data product's owner. |
contactPoints | [Contact Point Object] | A collection of contact information for the given data product. |
This object MAY be extended with Specification Extensions.
Owner Object⚓︎
The Owner Object
describes the data product's owner.
Fixed Fields⚓︎
This object MAY be extended with Specification Extensions.
Owner Object Example:⚓︎
Contact Point Object⚓︎
The Contact Point Object
describes a data product's contact point.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
name | string:name |
The name of the contact point. |
description | string |
The contact point description. CommonMark syntax MAY be used for rich text representation. |
channel | string |
The channel used to address the contact point. It can be for example equal to web , mail or phone . |
address | string |
The address of the contact point. Depending on the channel it can be for example an URL, an email address or a phone number. |
This object MAY be extended with Specification Extensions.
Contact Point Object Example:⚓︎
{
"name": "Support Team Mail",
"description": "The mail address of to the team that give support on this product",
"channel": "email",
"address": "trip-execution-support@company-xyz.com"
}
{
"name": "Issue Tracker",
"description": "The address of the issue tracker associated to this product",
"channel": "web",
"address": "https://readmine.company-xyz.com/trip-execution"
}
Interface Components Object⚓︎
The Interface Components Object
it's a collection of all interface entities exposed by a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
inputPorts | [Input Port Component| Reference Object] | The input ports exposed by the data product. |
outputPorts | [Output Port Component| Reference Object] | (REQUIRED) The output ports exposed by the data product. |
discoveryPorts | [Discovery Port Component| Reference Object] | The discovery ports exposed by the data product. |
observabilityPorts | [Observability Port Component | Reference Object] | The observability ports exposed by the data product. |
controlPorts | [Control Port Component| Reference Object] | The control ports exposed by the data product. |
This object cannot be extended with additional properties and any properties added SHALL be ignored.
Input Port Component⚓︎
The Input Port Component
describes an input port of a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name} . Example: "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It MUST be a constant value equals to inputport . |
name | string:name |
(REQUIRED) The name of the port. It MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented. |
displayName | string |
The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product. |
description | string |
The port descripion. CommonMark syntax MAY be used for rich text representation. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
promises | Promises Object | Reference Object | The data product's promises declared over the port. |
expectations | Expectation Object | Reference Object | The data product's expectations declared over the port. |
contracts | Contracts Object | Reference Object | The data product's contracts declared over the port. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Output Port Component⚓︎
The Output Port Component
describes an output port of a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY) The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:outputports:{port-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:outputports:tmsTripCDC" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It MUST be a constant value equals to outputport . |
name | string:name |
(REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented. |
displayName | string |
The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product. |
description | string |
The port descripion. CommonMark syntax MAY be used for rich text representation. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
promises | Promises Object | Reference Object | The data product's promises declared over the port. |
expectations | Expectation Object | Reference Object | The data product's expectations declared over the port. |
contracts | Contracts Object | Reference Object | The data product's contracts declared over the port. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Discovey Port Component⚓︎
The Discovey Port Component
describes a discovery port of a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It MUST be a constant value equals to discoveryport . |
name | string:name |
(REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented. |
displayName | string |
The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product. |
description | string |
The port descripion. CommonMark syntax MAY be used for rich text representation. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
promises | Promises Object | Reference Object | The data product's promises declared over the port. |
expectations | Expectation Object | Reference Object | The data product's expectations declared over the port. |
contracts | Contracts Object | Reference Object | The data product's contracts declared over the port. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Observability Port Component⚓︎
The Observability Port Component
describes an observability port of a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY) The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It MUST be a constant value equals to observabilityport . |
name | string:name |
(REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented. |
displayName | string |
The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product. |
description | string |
The port descripion. CommonMark syntax MAY be used for rich text representation. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
promises | Promises Object | Reference Object | The data product's promises declared over the port. |
expectations | Expectation Object | Reference Object | The data product's expectations declared over the port. |
contracts | Contracts Object | Reference Object | The data product's contracts declared over the port. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Control Port Component⚓︎
The Control Port Component
describes a control port of a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC" . |
entityType | string:alphanumeric |
The type of the entity. It MUST be a constant value equals to controlport . |
name | string:name |
(REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented. |
displayName | string |
The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product. |
description | string |
The port descripion. CommonMark syntax MAY be used for rich text representation. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
promises | Promises Object | Reference Object | The data product's promises declared over the port. |
expectations | Expectation Object | Reference Object | The data product's expectations declared over the port. |
contracts | Contracts Object | Reference Object | The data product's contracts declared over the port. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Promises Object⚓︎
The Promises Object
describes the data product's promises declared over a given port.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
platform | string |
The target technological platform in which the services associated with the given port operate. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific runtime technology used. Examples: onprem:milan-1 , aws:eu-south-1 , aws:eu-south-1:redshift . |
servicesType | string |
The type of services associated with the given port. Examples: soap-services , rest-services , odata-services ,streaming-services , datastore-services . |
api | Standard Definition Object | The formal description of port's services API. It is RECOMMENDED to use Open API Specification for restfull services, Async API Specification for streaming services and DataStore API Specification for data store connection based services. Other specifications MAY be used as required. |
depreceationPolicy | Standard Definition Object | The deprecation policy adopted by the port. It is RECOMMENDED to specify at least how long the deprecation period will be after the release of a new major version. |
slo | Standard Definition Object | The service level objectives supported by the port. It is RECOMMENDED to group SLO by category (ex. operational SLO, quality SLO, etc ...) and specify them in an easy to compute way. |
This object MAY be extended with Specification Extensions.
Expectations Object⚓︎
The Expectations Object
describes the data product's expectations declared over a given port.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
audience | Standard Definition Object | The audience of consumers for whom the the port has been designed. It is RECOMMENDED to specify inclusion and exclusion criteria in a way that is not ambiguous. |
usage | Standard Definition Object | The usage patterns for which the port has been designed. |
This object MAY be extended with Specification Extensions.
Obligations Object⚓︎
The ObligationsObject
describes the data product's obligations declared over a given port.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
termsAndConditions | Standard Definition Object | The terms and conditions defined on the port on which consumers must agree on and respect in order to use it. |
billingPolicy | Standard Definition Object | The billing policy defined on the port on which consumers must agree on and respect in order to use it. |
sla | Standard Definition Object | The service level agreements supported by the port. It is RECOMMENDED to group SLA by category (ex. operational SLA, quality SLA, ecc ...) and specify them in an easy to compute way. |
This object MAY be extended with Specification Extensions.
Internal Components Object⚓︎
The Internal Components Object
it's a collection of all internal entities that compose a data product.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
lifecycleInfo | Map[string , [Lifecycle Task Info Object]] |
A list of lifecycle stages and associated tasks. Each stage can contais multiple tasks. To move a product to a specific stage the platform MUST execute all the tasks associated to the target stage in order of definition, if a different order is not specified using the order property of Lifecycle Task Info Object. |
applicationComponents | [Application Component] | The list of application component that compose the data product. |
infrastructuralComponents | [Infrastructural Component] | The list of infrastructural components that compose the data product. |
This object cannot be extended with additional properties and any properties added SHALL be ignored.
Lifecycle Task Info Object⚓︎
The Lifecycle Task Info Object
describe a task to execute to move the product to a specific stage in its lifecycle. Each stage can contains multipe tasks.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
name | string:name |
The name of the task |
order | integer:int32 |
The execution order of the task. The platform MUST execute first the tasks with this field valorized properly ordered in ascendent way using the value of the field itself and only then all the other tasks in order of definition. |
service | Reference Object | The endpoint of the service to call in order to execute the task. |
template | Standard Definition Object | Reference Object | Can be an inline JSON or a refernce to an external resource. It contains the definition of the task to execute. It is passed as is to the executor service specified using the service field. |
configurations | object | Reference Object |
Can be an inline JSON or a refernce to an external resource. It contains the configuration properties that can be used by the executor service at execution time. It is passed as is to the executor service specified using the service field. |
This object MAY be extended with Specification Extensions.
Lifecycle Task Info Object Example:⚓︎
"lifecycleInfo": {
"test": [{
"service": {
"$href": "{azure-devops}"
},
"template": {
"name": "testPipeline",
"version": "1.0.0",
"specification": "azure-devops",
"specificationVersion": "1.0.0",
"definition": {
"organization": "andreagioia",
"project": "opendatamesh",
"pipelineId": "3",
"branch": "main"
}
},
"configurations": {
"stagesToSkip": ["Deploy"]
}
}],
"prod": [{
"service": {
"$href": "{azure-devops}"
},
"template": {
"name": "testPipeline",
"version": "1.0.0",
"specification": "azure-devops",
"specificationVersion": "1.0.0",
"definition": {
"organization": "andreagioia",
"project": "opendatamesh",
"pipelineId": "3",
"branch": "main"
}
},
"configurations": {
"stagesToSkip": []
}
}],
"deprecated": [{ }]
}
Application Component⚓︎
The Application Component
describes an internal application component used by the data product to provide services through its ports.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the component's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to address the component. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to address the component when calling API different from the ones exposed by the data product experience plane the component's fullyQualifiedName MUST be always used. Examples: "id: "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the component. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:applications:{app-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:applications:modelNormalizationJob" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It is a constant value equals to application . |
name | string:name |
(REQUIRED) The name of the application component. MUST be unique within the other application components of the same data product. It's RECOMMENDED to use a camel case formatted string. Example "name: "modelNormalizationJob" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's application component. |
displayName | string |
The human readable name of the component. It SHOULD be used by frontend tool to visualize application component's name in place of the name property. It's RECOMMENDED to not use the same displayName for different application component belonging to the same data product. |
description | string |
The application component description. CommonMark syntax MAY be used for rich text representation. |
platform | string |
The target technological platform on which the application will be deployed. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific runtime technology used. Examples: onprem:milan-1 , aws:eu-south-1 , aws:eu-south-1:redshift . |
applicationType | string |
The type of the application: Examples: stream-sourcing , batch-sourcing , streaming-transformation , batch-transformation , housekeeping , ecc... |
consumesFrom | [string:fqn ] |
The list of ports or infrastructural components from which the application consumes directly data or services. |
providesTo | [string:fqn ] |
The list of ports or infrastructural components to which the application provides directly data or services. |
dependsOn | [string:fqn ] |
A list of other internal components on which this application directly depends on. It is used during data product deployment to define a consistent deployment plan. Cyclic dependencies between components MUST be avoided. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Infrastructural Component⚓︎
The Infrastructural Component
describes an internal infrastructural component used by the data product to run its applications and store its data.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the component's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to address the component. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to address the component when calling API different from the ones exposed by the data product experience plane the component's fullyQualifiedName MUST be always used. Examples: "id: "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the component. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:infrastructure:{infra-name} . Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:infrastructure:stagingArea" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity. It is a constant value equals to infrastructure . |
name | string:name |
The name of the infrastructural component. MUST be unique within the other infrastructural components of the same data product. It's RECOMMENDED to use a camel case formatted string. Example "name: "stagingArea" . |
version | string:version |
(REQUIRED) The semantic version number of the data product's infrastructural component. |
displayName | string |
The human readable name of the component. It SHOULD be used by frontend tool to visualize application component's name in place of the name property. It's RECOMMENDED to not use the same displayName for different infrastructural component belonging to the same data product. |
description | string |
The infrastructural component descripion. CommonMark syntax MAY be used for rich text representation. |
platform | string |
The target technological platform on which the infrastructural component will be provisioned. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific resource object that will be provisioned. Examples: onprem:milan-1 , aws:eu-south-1 , aws:eu-south-1:s3-buket . |
infrastructureType | string |
The type of the infrastructural component. Examples: computation-resource , storage-resource , networking-resource , ecc... |
dependsOn | [string:fqn ] |
A list of other infrastructural components on which this component directly depends on. It is used during infrastructure provisioning to define a consistent provisioning plan. Cyclic dependencies between infrastructural components MUST be avoided. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation. |
This object MAY be extended with Specification Extensions.
Standard Definition Component⚓︎
The Standard Definition Component
formally describes a component (ex. API component, template component, etc ...) of interest following a given standard specification.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
id | string:uuid |
(READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName . It MAY be used when calling the API exposed by the data product experience plane to referentiate the component. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the component's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21" |
fullyQualifiedName | string:fqn |
(READONLY). The unique universal idetifier of the component. It MUST be a URN of the form urn:dpds:{mesh-namespace}:{entity-type}s:{name}:{version} . Example: "fullyQualifiedName: "urn:dpds:it.quantyca:apis:customApi:1" . |
entityType | string:alphanumeric |
(READONLY) The type of the entity (ex. api, template, ecc...) |
name | string:name |
(REQUIRED) The name of the component. It MUST be unique within the component of the same type. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC" . |
version | string:version |
(REQUIRED) The semantic version number of the component. |
displayName | string |
The human readable name of the component. It SHOULD be used by frontend tool to visualize component's name in place of the name property. It's RECOMMENDED to not use the same displayName for different components of the same type. |
description | string |
The object descripion. CommonMark syntax MAY be used for rich text representation. |
specification | string |
(REQUIRED) The external specification used in the definition . |
specificationVersion | string |
The version of the external specification used in the definition . If not defined the version MUST be included in the definition itself. |
definition | object | string | Reference Object |
(REQUIRED) The formal definition built using the spcification declared in the specification field. |
componentGroup | string:name |
The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates. |
tags | [string ] |
A list of tags associated to the component. Tags can be used for logical grouping of data product's components. |
externalDocs | External Resource Object | Additional external documentation for the component. |
This object MAY be extended with Specification Extensions.
Standard Definition Component Example:⚓︎
traduci: qualcosa di simile al modello distribuito descritto nel post quindi? come vengono gestite le divergenze a livello semantico che generalmente ci sono tra i diversi domini? Quando i concetti sono gli stessi e cambia solo la terminologia allora basta aver la possibilità di indicare
{
"description": "The API exposed by the Observability Port that exposes data product logs",
"specification": "openapi",
"version": "3.1.0",
"definition": {
"mediaType": "text/json",
"$href": "https://mycompany.com/api/v1/planes/utility/logging-services/openapi.json"
},
"externalDocs": {
"mediaType": "text/html",
"$href": "https://spec.openapis.org/oas/v3.1.0"
}
}
Components Object⚓︎
The Components Object
holds a set of reusable objects for different aspects of the DPDS.
All objects defined within the components object will have no effect on the Data Product Descriptor unless they are explicitly referenced from properties outside the components object.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
inputPorts | Map[string , Input Port Component | Reference Object] |
An object to hold reusable Input Port Component. |
outputPorts | Map[string , Output Port Component | Reference Object] |
An object to hold reusable Output Port Component. |
discoveryPorts | Map[string , Discovery Port Component | Reference Object] |
An object to hold reusable Discovery Port Component. |
observabilityPorts | Map[string , Observability Port Component | Reference Object] |
An object to hold reusable Observability Port Component. |
controlPorts | Map[string , Control Port Component | Reference Object] |
An object to hold reusable Control Port Component. |
applicationComponents | Map[string , Application Component | Reference Object] |
An object to hold reusable Application Component. |
infrastructuralComponents | Map[string , Infrastructural Component | Reference Object] |
An object to hold reusable Infrastructural Component. |
apis | Map[string , Standard Definition Object | Reference Object] |
An object to hold reusable Standard Definition Object of API. |
templates | Map[string , Standard Definition Object | Reference Object] |
An object to hold reusable Standard Definition Object of templates. |
This object MAY be extended with Specification Extensions.
All the fixed fields declared above are objects that MUST use keys that match the regular expression: ^[a-zA-Z0-9\.\-_]+$
.
Reference Object⚓︎
The Reference Object
allows referencing other components in the Data Product Descriptor Document, internally and externally.
The $ref
string value contains a URI RFC3986, which identifies the location of the value being referenced.
See the rules for resolving Relative References.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
description | string |
A description which by default SHOULD override that of the referenced component. CommonMark syntax MAY be used for rich text representation. If the referenced object-type does not allow a description field, then this field has no effect. |
mediaType | string |
The media type of the referenced resource. It must conform to media type format, according to RFC6838. |
$ref | string:uri-reference |
REQUIRED. The reference identifier. This MUST be in the form of a URI. |
This object cannot be extended with additional properties and any properties added SHALL be ignored.
Reference Object Example⚓︎
Relative Schema Document Example⚓︎
Relative Documents With Embedded Schema Example⚓︎
External Resource Object⚓︎
The External Resource Object
allows referencing an external resource like a documentation page.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
description | string |
A description of the target resource. CommonMark syntax MAY be used for rich text representation. |
mediaType | string |
The media type of target resource. It must conform to media type format, according to RFC6838. |
$href | string:uri |
REQUIRED. The URI of the target resource. It must conform to the URI format, according to RFC3986. |
This object cannot be extended with additional properties and any properties added SHALL be ignored.
External Resource Object Example⚓︎
{
"description": "Find more info here",
"mediaType": "text/html",
"$href": "https://example.com"
}
Specification Extension Point⚓︎
A Specification Extension Point
marks specific parts of the Data Product Descriptor Specification that are left open to extensions or further evolution of the standard. While a Standard Definition it's a formal declaration that the description of a part of the Data Product Descriptor Specification will be demanded by an external standard in this version of the specification and future ones, the same assumption it's not true for Specification Extension Points
. Even if a Specification Extension Point
can be extended at will it is RECOMMENDED to use for all added properties a field name prefixed by "x-" to avoid potential conflicts with future versions of the Data Product Descriptor Specification.
Fixed Fields⚓︎
Field Name | Type | Description |
---|---|---|
description | string | The extention point descripion. CommonMark syntax MAY be used for rich text representation. |
externalDocs | External Resource Object | Additional external documentation for the extention point |
This object MAY be extended with Specification Extensions.
Specification Extensions⚓︎
While the Data Product Descriptor Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
The extension properties are implemented as patterned fields that are always prefixed by "x-"
.
The extensions may or may not be supported by the available tooling, but those may be extended as well to add requested support (if tools are internal or open-sourced).
Appendix A: Revision History⚓︎
Version | Date | Notes |
---|---|---|
1.0.0 | 2023-Q1 | Release of the Data Product Descriptor Specification 1.0.0 |