Skip to content

Data Product Descriptor Specification⚓︎

Version 1.0.0 (DRAFT)⚓︎

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

This document is licensed under The Apache License, Version 2.0.

Disclaimer⚓︎

Part of this content has been taken from the great work done by the folks at theOpenAPI Initiative and AsyncAPI Initiative. We have decided to not reinvent the wheel and inspire our work to these two specifications mainly for the following reasons:

  • We think that the work made by OpenAPI Initiative and AsyncAPI Initiative is great :)
  • We want to make the learning curve for the Data Product Descriptor Specification as smooth as possible, aligning its definition to the one of other two popular specifications in the software and data engineers community
  • We think that OpenAPI and AsyncAPI are natural specifications for defining the interface of data product's ports that expose an API endpoint. This specification does not impose the use of any specific standard for the port's interface definition but these two are highly recommended.

Introduction⚓︎

The Data Product Descriptor Specification (DPDS) defines a declarative and technology-independent standard to describe a data product in all its components. It allows human agents (e.g. analysts, data scientists, etc..) and digital agents (e.g. other data products, BI tools, planes of the underlying data mesh ops platform, etc..) to operate, discover and access a data product. When properly defined, an external agent can understand and interact with the data product with a minimal amount of cognitive load and implementation logic.

The formalization of a standard data product descriptor document through an open specification is useful to enable the implementation of an ecosystem of interoperable data mesh tools. The following is a non-exhaustive list of tools that can benefit from the specification:

  • catalogs (search, document and collaborate)
  • design tools (create new products by the composition of reusable templates)
  • lifecycle management tools (deploy and operate)
  • access management tools (assign/track access grants and generate client code in different languages)
  • policies checking tools (enforce standard compliance and audit security)
  • observability tools (monitor and detect)
  • data lineage tools (trace data flows and perform forward/backward analysis)
  • mesh topology analysis tools (calculate value/trust scores and detect structural problems)
  • semantic tools (apply ontologies over mesh topology)
  • domain specific language tools (create a collection of interconnected data products that implement together a specific value stream)

Table of Contents⚓︎

Definitions⚓︎

Standard⚓︎

The set of shared rules used by different agents to describe an entity or process of common interest. The agents that follow the standard limit their autonomy by conforming to the set of shared rules to facilitate cooperation between them through interoperability.

Standard Specification⚓︎

The formal description of the rules that form a standard. A standard can have multiple specification versions associated with it. Sometimes the words standard and specification are used as synonymous.

Standard Definition⚓︎

The description of one specific entity or process created using and conforming to the set of rules formally described in the standard specification

Data Product⚓︎

The smallest unit that can be independently deployed and managed in a data architecture (i.e. architectural quantum). It is composed of all the structural components that it requires to do its function: metadata, data, code, policies that govern the data and its dependencies to infrastructure. Each data product has a clear identifier, a version number and an owner.

Data Product Ports⚓︎

The interfaces exposed to external agents by a data product. Each port exposes a service or set of correlated services. These are the five types of ports supported by a data product:

  • Input port(s): an input port describes a set of services exposed by a data product to collect its source data and makes it available for further internal transformation. An input port can receive data from one or more upstream sources in a push (i.e. asynchronous subscription) or pop mode (i.e. synchronous query). Each data product may have one or more input ports.
  • Output port(s): an output port describes a set of services exposed by a data product to share the generated data in a way that can be understood and trusted. Each data product may have one or more output ports.
  • Discovery port(s): a discovery port describes a set of services exposed by a data product to provide information about its static role in the overall architecture like purpose, structure, location, etc. Each data product may have one or multiple discovery ports.
  • Observability port(s): an observability port describes a set of services exposed by a data product to provide information about its dynamic behavior in the overall architecture like logs, traces, audit trails, metrics, etc. Each data product may have one or more observability ports.
  • Control port(s): a control port describes a set of services exposed by a data product to configure local policies or perform highly privileged governance operations. Each data product may have one or more control ports.

The data product descriptor specification uses the following concepts of promises theory to formally describe the set of services exposed by each port regardless of the specific type:

  • Promises: Through promises, the data product declares the intent of the port. Promises are not a guarantee of the outcome but the data product will behave accordingly to them to realize its intent. The more a data product keeps its promises over time and the more trustworthy it is. Thus, the more trustworthy a data product is the more potential consumers are likely to use it. Trust is based on the verification of how good a data product was in the past in keeping its promises. This verification should be automated by the underlying platform and synthesized in a trust score shared with all potential consumers. Examples of promises are descriptions of services, API, SLO, deprecation policy, etc.
  • Expectations:** Through expectations, the data product declares how it wants the port to be used by its consumers. Expectations are the inverse of promises. They are a way to explicitly state what promises the data product would like consumers to make regarding how they will use the port. Examples of expectations are intended usage, intended audience, etc.
  • Contracts:** Through contracts, the data product declares promises and expectations that must be respected by the data product and its consumers. A contract is an explicit agreement between the data product and its consumers. It is used to group all the promises and expectations that if not respected can generate penalties like monetary sanctions or interruption of service. Examples of contracts are terms of conditions, SLA, billing policy, etc.

The governance can use these concepts to standardize the definition of these interfaces across all data products, while the platform can use them to provide the mechanisms to implement the described services in a compliant way.

Data Product Application Components⚓︎

The components of a data product that implement the services exposed through its ports (i.e. pipelines, microservices, etc..).

Data Product Infrastructural Components⚓︎

The components of a data product related to the infrastructural resources (i.e. storage, computing, etc..) used to run its application components.

Data Product Descriptor Document⚓︎

The document (or set of documents) that contains the standard definition of a data product created using and conforming to the Data Product Descriptor Specification.

Data Product Descriptor Specification⚓︎

The formal description of the rules to follow to create a standard-compliant Data Product Descriptor Document.

Specification⚓︎

Versions⚓︎

The Data Product Descriptor Specification is versioned using Semantic Versioning 2.0.0 (semver) and follows the semver specification.

The major.minor portion of the semver (for example 1.0) SHALL designate the DPDS feature set. Typically, .patch versions address errors in this document, not the feature set. Tooling which supports DPDS 1.0 SHOULD be compatible with all DPDS 1.0.* versions. The patch version SHOULD NOT be considered by tooling, making any distinction between 1.0.0 and 1.0.1 for example.

Each new minor version of the Data Product Descriptor Specification SHALL allow any Product Descriptor document that is valid against any previous minor version of the Specification, within the same major version, to be updated to the new Specification version with equivalent semantics. Such an update MUST only require changing the dataProductDescriptor property to the new minor version.

For example, a valid Data Product Descriptor 1.0.2 document, upon changing its dataProductDescriptor property to 1.1.0, SHALL be a valid Data Product Descriptor 1.1.0 document, semantically equivalent to the original Data Product Descriptor 1.0.2 document. New minor versions of the Data Product Descriptor Specification MUST be written to ensure this form of backward compatibility.

Format⚓︎

A Data Product Descriptor Document that conforms to the Data Product Descriptor Specification is itself a JSON object, which may be represented either in JSON or YAML format.

For example, if a field has an array value, the JSON array representation will be used:

JSON
{
   "field": [ 1, 2, 3 ]
}
All field names in the specification are case-sensitive. This includes all fields that are used as keys in a map, except where explicitly noted that keys are case-insensitive.

The schema exposes two types of fields: Fixed fields, which have a declared name, and Patterned fields, which declare a regex pattern for the field name.

Patterned fields MUST have unique names within the containing object. To preserve the ability to round-trip between YAML and JSON formats, YAML version 1.2 is RECOMMENDED along with some additional constraints:

Document Structure⚓︎

A Data Product Descriptor Document MAY be made up of a single document or be divided into multiple, connected parts at the discretion of the user. In the latter case, a Reference Object is used.

It is RECOMMENDED that the root Data Product Descriptor Document be named: data-product-descriptor.json or data-product-descriptor.yaml.

Object Types⚓︎

A [Data Product Descriptor Document has one and only one root object. The properties of an object are described by its fields. A field type can be another object or a primitive type. An addressable and versioned object is called entity. The root object of the Data Product Descriptor Document is an entity object. Other entities that exist only in the scope of the root entity are called components.

Data Types⚓︎

Primitive data types in the DPDS are based on the types supported by the JSON Schema Specification.

Primitives have an optional modifier property: format. DPDS uses several known formats to define in fine detail the data type being used. However, to support documentation needs, the format property is an open string-valued property and can have any value. Formats such as "email", "uuid", and so on, MAY be used even though undefined by this specification. Types that are not accompanied by a format property follow the type definition in the JSON Schema. Tools that do not recognize a specific format MAY default back to the type alone as if the format is not specified.

The formats defined by the DPDS are:

type format Comments
integer int32 signed 32 bits
integer int64 signed 64 bits (a.k.a long)
number float
number double
string
string alphanumeric a string that match the following regex ^[a-zA-Z0-9]+$
string name a string that match the following regex ^[a-zA-Z][a-zA-Z0-9]+$
string fqn a string that match the following regex ^[a-zA-Z][a-zA-Z0-9.:]+$
string version a string that match the following regex ^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
string byte base64 encoded characters
string binary any sequence of octets
string uuid a sequence of 16 octets as defined by RFC4122
boolean
string date As defined by full-date - RFC3339
string date-time As defined by date-time - RFC3339
string password A hint to UIs to obscure input.

Rich Text Formatting⚓︎

Throughout the specification description fields are noted as supporting CommonMark markdown formatting. Where Data Product Descriptor tooling renders rich text it MUST support, at a minimum, markdown syntax as described by CommonMark 0.27. Tooling MAY choose to ignore some CommonMark features to address security concerns.

Relative References in URLs⚓︎

Unless specified otherwise, all properties that are URLs SHOULD be absolute references. If a property explicitly specifies in its description that allows a relative reference its value MUST be compliant with RFC3986. Relative references MUST be resolved using the URLs defined in the property description as a Base URI.

Relative references used in $ref are processed as per JSON Reference, using the URL of the current document as the base URI. See also the Reference Object.

Schema⚓︎

In the following description, if a field is not explicitly REQUIRED or described with a MUST or SHALL, it can be considered OPTIONAL.

Data Product Descriptor Entity⚓︎

This is the root object of the Data Product Descriptor Document.

Fixed Fields⚓︎
Field Name Type Description
dataProductDescriptor string:version (REQUIRED) The semantic version number of the Data Product Descriptor Specification Version that the Data Product Descriptor Document uses. The dataProductDescriptor field SHOULD be used by tooling specifications and clients to interpret the Data Product Descriptor Document. This is not related to the data product's info.version field.
info Info Object (REQUIRED) The general information about the data product. The information can be used by the platform or by consumers if needed.
interfaceComponents Interface Components Object (REQUIRED) The list of all interface entities exposed by the data product.
internalComponents Internal Components Object The list of all internal entities that compose the data product.
components Components Object An element to hold a set of reusable objects that can be referentiated in other part of the document.
tags [string] A list of tags associated to the data product. Tags can be used for logical grouping of data products.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Info Object⚓︎

The Info Object contains general information about the data product. The information can be used by the platform or by consumers if needed.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (RFC-4122) generated server side during data product creation as SHA-1 hash of the fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the data product. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling an API different from the ones exposed by the data product experience plane the fullyQualifiedName MUST be always used. Example: "id: "2b172838-73b1-5d6c-be45-cc75aee180a0"
fullyQualifiedName string:fqn (REQUIRED) The unique universal idetifier of the data product. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}. It's RECOMMENDED to use as mesh-namespace your company's domain name in reverse dot notation (es it.quantyca) in order to ensure that the fullyQualifiedName is unique universal idetifier as REQUIRED. To the mesh-nemaspace MAY be added as postfix the product's 'domain' (es. planning, operations, ...). Using the data product's domain as postfix in the mesh-namespace is anyway NOT RECOMMENDED. Example: "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1".
entityType string:alphanumeric (READONLY) The type of the entity. It MUST be a constant value equals to dataproduct.
name string:name (REQUIRED) The name of the data product. MUST be unique within the mesh-namespace. It's RECOMMENDED to use a camel case formatted string.
version string:version (REQUIRED) The semantic version number of the data product definition contained in the given Data Product Descriptor Document. Everytime the major version of one of the data product's ports changes also the major version of the product MUST be incremented. It is RECOMMENDED to use 0 as major version for data products that are not yet general available. These data products can introduce breaking changes without incrementing their major version. It is anyway RECOMMENDED that for every breaking change introduced by a data product that is not yet general available (i.e. major version equals to 0) at least the minor version is incremented. This field is not related to the dataProductDescriptor field.
displayName string The human readable name of the data product. It SHOULD be used by frontend tool to visualize data product's name in place of the name property. It's RECOMMENDED to not use the same displayName for different data products belonging to the same mesh-namespace.
description string The data product description. CommonMark syntax MAY be used for rich text representation.
domain string:name (REQUIRED) The domain to which the data product belongs to.
owner Owner Object (REQUIRED) A collection of information related to the data product's owner.
contactPoints [Contact Point Object] A collection of contact information for the given data product.

This object MAY be extended with Specification Extensions.

Owner Object⚓︎

The Owner Object describes the data product's owner.

Fixed Fields⚓︎
Field Name Type Description
id string (REQUIRED) The identifier of the data product's owner. It's RECOMMENDED to use the corporate mail of the owner as identifier.
name string The full name of the data product's owner

This object MAY be extended with Specification Extensions.

Owner Object Example:⚓︎
JSON
{
  "id": "john.doe@company-xyz.com",
  "name": "John Doe"
}

Contact Point Object⚓︎

The Contact Point Object describes a data product's contact point.

Fixed Fields⚓︎
Field Name Type Description
name string:name The name of the contact point.
description string The contact point description. CommonMark syntax MAY be used for rich text representation.
channel string The channel used to address the contact point. It can be for example equal to web, mail or phone.
address string The address of the contact point. Depending on the channel it can be for example an URL, an email address or a phone number.

This object MAY be extended with Specification Extensions.

Contact Point Object Example:⚓︎
JSON
{
  "name": "Support Team Mail",
  "description": "The mail address of to the team that give support on this product",
  "channel": "email",
  "address": "trip-execution-support@company-xyz.com"
}
JSON
{
  "name": "Issue Tracker",
  "description": "The address of the issue tracker associated to this product",
  "channel": "web",
  "address": "https://readmine.company-xyz.com/trip-execution"
}

Interface Components Object⚓︎

The Interface Components Object it's a collection of all interface entities exposed by a data product.

Fixed Fields⚓︎
Field Name Type Description
inputPorts [Input Port Component| Reference Object] The input ports exposed by the data product.
outputPorts [Output Port Component| Reference Object] (REQUIRED) The output ports exposed by the data product.
discoveryPorts [Discovery Port Component| Reference Object] The discovery ports exposed by the data product.
observabilityPorts [Observability Port Component | Reference Object] The observability ports exposed by the data product.
controlPorts [Control Port Component| Reference Object] The control ports exposed by the data product.

This object cannot be extended with additional properties and any properties added SHALL be ignored.

Input Port Component⚓︎

The Input Port Component describes an input port of a data product.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}. Example: "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC".
entityType string:alphanumeric The type of the entity. It MUST be a constant value equals to inputport.
name string:name (REQUIRED) The name of the port. It MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version string:version (REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName string The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description string The port descripion. CommonMark syntax MAY be used for rich text representation.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
promises Promises Object | Reference Object The data product's promises declared over the port.
expectations Expectation Object | Reference Object The data product's expectations declared over the port.
contracts Contracts Object | Reference Object The data product's contracts declared over the port.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Output Port Component⚓︎

The Output Port Component describes an output port of a data product.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualifiedName MUST be always used. Example: "id": "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED) The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:outputports:{port-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:outputports:tmsTripCDC".
entityType string:alphanumeric The type of the entity. It MUST be a constant value equals to outputport.
name string:name (REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version string:version (REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName string The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description string The port descripion. CommonMark syntax MAY be used for rich text representation.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
promises Promises Object | Reference Object The data product's promises declared over the port.
expectations Expectation Object | Reference Object The data product's expectations declared over the port.
contracts Contracts Object | Reference Object The data product's contracts declared over the port.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Discovey Port Component⚓︎

The Discovey Port Component describes a discovery port of a data product.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC".
entityType string:alphanumeric The type of the entity. It MUST be a constant value equals to discoveryport.
name string:name (REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version string:version (REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName string The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description string The port descripion. CommonMark syntax MAY be used for rich text representation.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
promises Promises Object | Reference Object The data product's promises declared over the port.
expectations Expectation Object | Reference Object The data product's expectations declared over the port.
contracts Contracts Object | Reference Object The data product's contracts declared over the port.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Observability Port Component⚓︎

The Observability Port Component describes an observability port of a data product.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED) The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC".
entityType string:alphanumeric The type of the entity. It MUST be a constant value equals to observabilityport.
name string:name (REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version string:version (REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName string The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description string The port descripion. CommonMark syntax MAY be used for rich text representation.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
promises Promises Object | Reference Object The data product's promises declared over the port.
expectations Expectation Object | Reference Object The data product's expectations declared over the port.
contracts Contracts Object | Reference Object The data product's contracts declared over the port.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Control Port Component⚓︎

The Control Port Component describes a control port of a data product.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the port's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to referentiate the port. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to referentiate the data product when calling API different from the ones exposed by the data product experience plane the port's fullyQualified name MUST be always used. "id: "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED). The unique universal idetifier of the port. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:inputports:{port-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:inputports:tmsTripCDC".
entityType string:alphanumeric The type of the entity. It MUST be a constant value equals to controlport.
name string:name (REQUIRED) The name of the port. MUST be unique within the other input ports of the same data product. It's RECOMMENDED to use a cammel case formatted string. Example "name: "tmsTripCDC".
version string:version (REQUIRED) The semantic version number of the data product's port. Everytime the major version of port changes also the major version of the product MUST be incremented.
displayName string The human readable name of the port. It SHOULD be used by frontend tool to visualize port's name in place of the name property. It's RECOMMENDED to not use the same displayName for different input ports belonging to the same data product.
description string The port descripion. CommonMark syntax MAY be used for rich text representation.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
promises Promises Object | Reference Object The data product's promises declared over the port.
expectations Expectation Object | Reference Object The data product's expectations declared over the port.
contracts Contracts Object | Reference Object The data product's contracts declared over the port.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Promises Object⚓︎

The Promises Object describes the data product's promises declared over a given port.

Fixed Fields⚓︎
Field Name Type Description
platform string The target technological platform in which the services associated with the given port operate. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific runtime technology used. Examples: onprem:milan-1, aws:eu-south-1, aws:eu-south-1:redshift.
servicesType string The type of services associated with the given port. Examples: soap-services, rest-services, odata-services,streaming-services, datastore-services.
api Standard Definition Object The formal description of port's services API. It is RECOMMENDED to use Open API Specification for restfull services, Async API Specification for streaming services and DataStore API Specification for data store connection based services. Other specifications MAY be used as required.
depreceationPolicy Specification Extension Point The deprecation policy adopted by the port. It is RECOMMENDED to specify at least how long the deprecation period will be after the release of a new major version.
slo Specification Extension Point The service level objectives supported by the port. It is RECOMMENDED to group SLO by category (ex. operational SLO, quality SLO, etc ...) and specify them in an easy to compute way.

This object MAY be extended with Specification Extensions.

Expectations Object⚓︎

The Expectations Object describes the data product's expectations declared over a given port.

Fixed Fields⚓︎
Field Name Type Description
audience Specification Extension Point The audience of consumers for whom the the port has been designed. It is RECOMMENDED to specify inclusion and exclusion criteria in a way that is not ambiguous.
usage Specification Extension Point The usage patterns for which the port has been designed.

This object MAY be extended with Specification Extensions.

Contracts Object⚓︎

The Contracts Object describes the data product's contracts declared over a given port.

Fixed Fields⚓︎
Field Name Type Description
termsAndConditions Specification Extension Point The terms and conditions defined on the port on which consumers must agree on and respect in order to use it.
billingPolicy Specification Extension Point The billing policy defined on the port on which consumers must agree on and respect in order to use it.
sla Specification Extension Point The service level agreements supported by the port. It is RECOMMENDED to group SLA by category (ex. operational SLA, quality SLA, ecc ...) and specify them in an easy to compute way.

This object MAY be extended with Specification Extensions.

Internal Components Object⚓︎

The Internal Components Object it's a collection of all internal entities that compose a data product.

Fixed Fields⚓︎
Field Name Type Description
applicationComponents [Application Component] The list of application component that compose the data product.
infrastructuralComponents [Infrastructural Component] The list of infrastructural components that compose the data product.

This object cannot be extended with additional properties and any properties added SHALL be ignored.

Application Component⚓︎

The Application Component describes an internal application component used by the data product to provide services through its ports.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the component's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to address the component. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to address the component when calling API different from the ones exposed by the data product experience plane the component's fullyQualifiedName MUST be always used. Examples: "id: "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED). The unique universal idetifier of the component. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:applications:{app-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:applications:modelNormalizationJob".
entityType string:alphanumeric (READONLY) The type of the entity. It is a constant value equals to application.
name string:name (REQUIRED) The name of the application component. MUST be unique within the other application components of the same data product. It's RECOMMENDED to use a camel case formatted string. Example "name: "modelNormalizationJob".
version string:version (REQUIRED) The semantic version number of the data product's application component.
displayName string The human readable name of the component. It SHOULD be used by frontend tool to visualize application component's name in place of the name property. It's RECOMMENDED to not use the same displayName for different application component belonging to the same data product.
description string The application component description. CommonMark syntax MAY be used for rich text representation.
platform string The target technological platform on which the application will be deployed. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific runtime technology used. Examples: onprem:milan-1, aws:eu-south-1, aws:eu-south-1:redshift.
applicationType string The type of the application: Examples: stream-sourcing, batch-sourcing, streaming-transformation, batch-transformation, housekeeping, ecc...
buildInfo Build Info Object The information required to build the application component.
deployInfo Deploy Info Object The information required to deploy the application component.
consumesFrom [string:fqn] The list of ports or infrastructural components from which the application consumes directly data or services.
providesTo [string:fqn] The list of ports or infrastructural components to which the application provides directly data or services.
dependsOn [string:fqn] A list of other internal components on which this application directly depends on. It is used during data product deployment to define a consistent deployment plan. Cyclic dependencies between components MUST be avoided.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Build Info Object⚓︎

The Build Info Object contains all the information required to build an Application Component

Fixed Fields⚓︎
Field Name Type Description
service Reference Object The endpoint of the service to call in order to build the application component.
template object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the definition of the pipeline to execute in order to build the application. It is passed as is to the build service specified using the buildService field.
configurations object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the configuration properties that can be used by the build service at build time. It is passed as is to the build service specified using the buildService field.

This object MAY be extended with Specification Extensions.

Deploy Info Object⚓︎

The Deploy Info Object contains all the informations required to deploy an Application Component

Fixed Fields⚓︎
Field Name Type Description
service Reference Object The endpoint of the service to call in order to deploy the application component.
template object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the definition of the pipeline to execute in order to deploy the application. It is passed as is to the deployment service specified using the deploymentService field.
configurations object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the configuration properties that can be used by the deployment service at deploy time. It is passed as is to the deployment service specified using the deploymentService field.

This object MAY be extended with Specification Extensions.

Infrastructural Component⚓︎

The Infrastructural Component describes an internal infrastructural component used by the data product to run its applications and store its data.

Fixed Fields⚓︎
Field Name Type Description
id string:uuid (READONLY) It's an UUID version 3 (see RFC-4122) generated server side during data product creation as SHA-1 hash of the component's fullyQualifiedName. It MAY be used when calling the API exposed by the data product experience plane to address the component. Because the fullyQualifiedName is globally unique also the id is globally unique, anyway to address the component when calling API different from the ones exposed by the data product experience plane the component's fullyQualifiedName MUST be always used. Examples: "id: "3235744b-8d2e-57b5-afba-f66862cc6a21"
fullyQualifiedName string:fqn (REQUIRED). The unique universal idetifier of the component. It MUST be a URN of the form urn:dpds:{mesh-namespace}:dataproducts:{product-name}:{product-major-version}:infrastructure:{infra-name}. Example "fullyQualifiedName: "urn:dpds:it.quantyca:dataproducts:tripExecution:1:infrastructure:stagingArea".
entityType string:alphanumeric (READONLY) The type of the entity. It is a constant value equals to infrastructure.
name string:name The name of the infrastructural component. MUST be unique within the other infrastructural components of the same data product. It's RECOMMENDED to use a camel case formatted string. Example "name: "stagingArea".
version string:version (REQUIRED) The semantic version number of the data product's infrastructural component.
displayName string The human readable name of the component. It SHOULD be used by frontend tool to visualize application component's name in place of the name property. It's RECOMMENDED to not use the same displayName for different infrastructural component belonging to the same data product.
description string The infrastructural component descripion. CommonMark syntax MAY be used for rich text representation.
platform string The target technological platform on which the infrastructural component will be provisioned. It contains usually the infrastructure provider and data center location. Optionally it can contains also the specific resource object that will be provisioned. Examples: onprem:milan-1, aws:eu-south-1, aws:eu-south-1:s3-buket.
infrastructureType string The type of the infrastructural component. Examples: computation-resource, storage-resource, networking-resource, ecc...
provisionInfo Provision Info Object The information required to provision the infrastructural component.
dependsOn [string:fqn] A list of other infrastructural components on which this component directly depends on. It is used during infrastructure provisioning to define a consistent provisioning plan. Cyclic dependencies between infrastructural components MUST be avoided.
componentGroup string:name The name of the group this component belongs to. Grouping different components together is useful to define sub modules withing a data product. A sub module can be used as base for creating reusable templates.
tags [string] A list of tags associated to the component. Tags can be used for logical grouping of data product's components.
externalDocs External Resource Object Additional external documentation.

This object MAY be extended with Specification Extensions.

Provision Info Object⚓︎

The Provision Info Object contains all the informations required to provision an Infrastructural Component

Fixed Fields⚓︎
Field Name Type Description
service Reference Object The endpoint of the service to call in order to provision the infrastructural component.
template object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the definition of the resources to provision. It is passed as is to the provisioning service specified using the provisionService field.
configurations object | string | Reference Object Can be an inline JSON or a refernce to an external resource. It contains the configuration properties that can be used by the provisioning service at provision time. It is passed as is to the provision service specified using the provisioningService field.

This object MAY be extended with Specification Extensions.

Components Object⚓︎

The Components Object holds a set of reusable objects for different aspects of the DPDS. All objects defined within the components object will have no effect on the Data Product Descriptor unless they are explicitly referenced from properties outside the components object.

Fixed Fields⚓︎
Field Name Type Description
inputPorts Map[string, Input Port Component | Reference Object] An object to hold reusable Input Port Component.
outputPorts Map[string, Output Port Component | Reference Object] An object to hold reusable Output Port Component.
discoveryPorts Map[string, Discovery Port Component | Reference Object] An object to hold reusable Discovery Port Component.
observabilityPorts Map[string, Observability Port Component | Reference Object] An object to hold reusable Observability Port Component.
controlPorts Map[string, Control Port Component | Reference Object] An object to hold reusable Control Port Component.
applicationComponents Map[string, Application Component | Reference Object] An object to hold reusable Application Component.
infrastructuralComponents Map[string, Infrastructural Component | Reference Object] An object to hold reusable Infrastructural Component.

This object MAY be extended with Specification Extensions.

All the fixed fields declared above are objects that MUST use keys that match the regular expression: ^[a-zA-Z0-9\.\-_]+$.

Reference Object⚓︎

The Reference Object allows referencing other components in the Data Product Descriptor Document, internally and externally.

The $ref string value contains a URI RFC3986, which identifies the location of the value being referenced.

See the rules for resolving Relative References.

Fixed Fields⚓︎
Field Name Type Description
description string A description which by default SHOULD override that of the referenced component. CommonMark syntax MAY be used for rich text representation. If the referenced object-type does not allow a description field, then this field has no effect.
mediaType string The media type of the referenced resource. It must conform to media type format, according to RFC6838.
$ref string:uri-reference REQUIRED. The reference identifier. This MUST be in the form of a URI.

This object cannot be extended with additional properties and any properties added SHALL be ignored.

Reference Object Example⚓︎
JSON
{
  "$ref": "#/components/schemas/Pet"
}
Relative Schema Document Example⚓︎
JSON
{
  "$ref": "Pet.json"
}
Relative Documents With Embedded Schema Example⚓︎
JSON
{
  "$ref": "definitions.json#/Pet"
}

External Resource Object⚓︎

The External Resource Object allows referencing an external resource like a documentation page.

Fixed Fields⚓︎
Field Name Type Description
description string A description of the target resource. CommonMark syntax MAY be used for rich text representation.
mediaType string The media type of target resource. It must conform to media type format, according to RFC6838.
$href string:uri REQUIRED. The URI of the target resource. It must conform to the URI format, according to RFC3986.

This object cannot be extended with additional properties and any properties added SHALL be ignored.

External Resource Object Example⚓︎
JSON
{
  "description": "Find more info here",
  "mediaType": "text/html",
  "$href": "https://example.com"
}

Standard Definition Object⚓︎

The Standard Definition Object formally describes an object (ex. API object, provision template object, etc ...) of interest following a given standard specification.

Fixed Fields⚓︎
Field Name Type Description
description string The standard definition descripion. CommonMark syntax MAY be used for rich text representation.
specification string (REQUIRED) The external specification used in the definition.
version string The version of the external specification used in the definition. If not defined the version MUST be included in the definition itself.
definition object | string | Reference Object (REQUIRED) The formal definition built using the spcification declared in the [specification](#standardDefinitionSpecification) field.
externalDocs External Resource Object Additional external documentation for the standard definition.

This object MAY be extended with Specification Extensions.

Standard Definition Object Example:⚓︎
JSON
{
  "description": "The API exposed by the Observability Port that exposes data product logs",
  "specification": "openapi",
  "version": "3.1.0",
  "definition": {
    "mediaType": "text/json",
    "$href": "https://mycompany.com/api/v1/planes/utility/logging-services/openapi.json"
  },
  "externalDocs": {
    "mediaType": "text/html",
    "$href": "https://spec.openapis.org/oas/v3.1.0"
  }
}

Specification Extension Point⚓︎

A Specification Extension Point marks specific parts of the Data Product Descriptor Specification that are left open to extensions or further evolution of the standard. While a Standard Definition it's a formal declaration that the description of a part of the Data Product Descriptor Specification will be demanded by an external standard in this version of the specification and future ones, the same assumption it's not true for Specification Extension Points. Even if a Specification Extension Point can be extended at will it is RECOMMENDED to use for all added properties a field name prefixed by "x-" to avoid potential conflicts with future versions of the Data Product Descriptor Specification.

Fixed Fields⚓︎
Field Name Type Description
description string The extention point descripion. CommonMark syntax MAY be used for rich text representation.
externalDocs External Resource Object Additional external documentation for the extention point

This object MAY be extended with Specification Extensions.

Specification Extensions⚓︎

While the Data Product Descriptor Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points. The extension properties are implemented as patterned fields that are always prefixed by "x-".

Field Pattern Type Description
^x- Any Allows extensions to the Data Product Descriptor Schema. The field name MUST begin with x-, for example, x-internal-id. The value can be null, a primitive, an array or an object. Can have any valid JSON format value.

The extensions may or may not be supported by the available tooling, but those may be extended as well to add requested support (if tools are internal or open-sourced).

Appendix A: Revision History⚓︎

Version Date Notes
1.0.0 2023-Q1 Release of the Data Product Descriptor Specification 1.0.0

Last update: June 4, 2023
Created: November 7, 2022