Skip to content

Schema Annotations Specification⚓︎

Introduction⚓︎

The Schema Annotation Specification (SAS) defines how to annotate structural elements within a schema describing a data model with metadata. SAS is independent of the Schema Definition Language (SDL) used, meaning it can be applied alongside any SDL (e.g., JSON Schema, Avro, Protobuf, XSD, etc.).

Defining schema annotations through an open specification is useful for:

  • Adding descriptive metadata directly within the schema that describes how data is structured, both at rest and in transit.
  • Allowing developers to use their preferred SDL while embedding metadata without needing to learn a separate formalization for specifying it.
  • Ensuring that wherever the schema is used, the metadata is also available.
  • Enabling tools that extract metadata from the schema to leverage standardized semantics, interpreting their meaning and using them appropriately

Conventions and Terminology⚓︎

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 RFC2119 RFC8174 when, and only when, they appear in all capitals, as shown here.

Versions⚓︎

The SAS is versioned using Semantic Versioning 2.0.0 (semver) and follows the semver specification.

The major.minor portion of the semver (for example 1.0) SHALL designate the SAS feature set. Typically, .patch versions address errors in this document, not the feature set. Tooling which supports SAS 1.0 SHOULD be compatible with all SAS 1.0.* versions. The patch version SHOULD NOT be considered by tooling, making any distinction between 1.0.0 and 1.0.1 for example.

Each new minor version of the SAS SHALL produce annotations that are enterpretable by consumers in the same way as in any previous minor version of the Specification, within the same major version. Such an update MUST only require changing the sas property to the new minor version.

For example, a valid schema document annotated using SAS 1.0.2, upon changing its sas property to 1.1.0, SHALL be a valid schema document annotated with SAS 1.1.0, semantically equivalent to the original schema document. New minor versions of the SAS MUST be written to ensure this form of backward compatibility.

License⚓︎

This document is licensed under The Apache License, Version 2.0.

Table of Contents⚓︎

Definitions⚓︎

Standard⚓︎

The set of shared rules used by different agents to describe an entity or process of common interest. By conforming to these rules, agents limit their autonomy to enable interoperability, allowing for smoother cooperation.

Standard Specification⚓︎

The formal description of the rules that form a standard. A standard can have multiple specification versions associated with it. Sometimes the words standard and specification are used as synonymous.

Standard Definition⚓︎

The description of one specific entity or process created using and conforming to the set of rules formally described in the standard specification

Schema⚓︎

A schema is a machine-readable description of the structure of a dataset. It can be used to validate the structure of a dataset, to decide how to query it, and to encode the results.

Schema Definition Language⚓︎

A schema definition language is the formalism used to describe the schema of a dataset (es. JSON Schema, Avro Schema, Protobub, XML Schema, etc...)

Schema Annotation⚓︎

A schema annotation is a piece of information embedded in the schema definition to add metadata related to a specific part of the structure. The annotation is not directly used for validating or encoding the data.

Schema Document⚓︎

The document (or set of documents) that contains the schema definition.

SAS Specification⚓︎

The list of standard annotations that can be used describes the different parts of the schema defined in a Schema Document.

Specification⚓︎

Meta Model⚓︎

A dataset is a structured collection of data. Each entry within the dataset adheres to the same structure, which is referred to as its data model. The data model defines how data within each entry are arranged and related.

A meta model provides the framework for defining the data model of a dataset. In SAS, the meta model describes the structure of a dataset entry as an object (i.e. root object or schema) consisting of an unordered list of named properties.

1.0.0-DRAFT-sas-meta-model.png

Each property represents a portion of the data within a dataset entry and can take one of two forms:

  • Primitive Properties: When a property’s type is a primitive element of the meta model (e.g., boolean, number, string), it describes a single, specific data point within the entry.

  • Composite Properties: When a property’s type is a composite element of the meta model (e.g., array, object), it describes a nested collection of data points within the parent entry.

A data model consisting only of primitive properties is said to have a tabular structure, while one that incorporates composite properties is described as having a document structure.

Schema Definition Language⚓︎

A schema is a machine-readable representation of a data model defined using a specific Schema Definition Language (SDL). Each SDL is grounded in a foundational meta-model that shapes the keywords and syntax the language uses to describe the schema of a specific data model.

Keywords used by an SDL can be divided into three main categories:

  • Core keywords: These define the basic structure of the data model. For example, in JSON Schema, keywords like $id, object, properties, string, number, and boolean are used to shape the model.

  • Functional keywords: These are reserved keywords that trigger specific actions in the supporting tools. Their behavior depends on the context in which the SDL is defined. In most DSLs, functional keywords are used to validate schema instances (i.e., entries in the dataset). For example, in JSON Schema, the keyword required specifies which properties are mandatory in an object.

  • Annotation keywords: These keywords are not reserved and don’t trigger any specific behavior. Instead, they provide additional information to enhance the data model (metadata). For example, in JSON Schema, the description keyword is used to add a textual explanation of a property.

SAS specifies how to define and use annotation keywords within any SDL that can describe data models with both tabular and document structures.

JSON Schema⚓︎

In JSON Schema any keywords not defined by the specification is considered by default an annotation keyword.

[from JSON Schema Specification] unrecognized individual keywords simply have their values collected as annotations

In the following example, author and unit are annotations. The author annotation specifies that Andrea created the schema, while the unit annotation indicates that the temperatures in the dataset must be expressed in degrees Celsius.

JSON
{
  "type": "object",

  "author": "Andrea",

  "properties": {
    "temperatures": {
      "type": "array",
      "items": {
        "type": "number",
    "unit": "Celsius"
      },
    },
    "sensor": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
        },
        "location": {
          "type": "string",
        }
      },
      "required": ["id", "location"],
    }
  },
  "required": ["temperatures", "sensor"]
}

Avro⚓︎

In Avro any keywords not defined by the specification is considered by default an annotation keyword.

[from Avro Specification] Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.

The following example specify with Avro a data model structure equivalent to the one defined in the JSON Schema example of previous section.

JSON
{
  "type": "record",
  "name": "TemperatureData",

  "author": "Andrea",

  "fields": [
    {
      "name": "temperatures",
      "type": {
        "type": "array",
        "items": "double",
    "unit": "Celsius"
      }
    },
    {
      "name": "sensor",
      "type": {
        "type": "record",
        "name": "Sensor",
        "fields": [
          {
            "name": "id",
            "type": "string"
          },
          {
            "name": "location",
            "type": "string"
          }
        ]
      }
    }
  ]
}

Protobuf⚓︎

In Protobuf, annotations must be defined through custom options, as shown in the following example:

Protocol Buffer
syntax = "proto3";

// Import custom options from sas-annotations.proto
import "sas-annotations.proto";  

message TemperatureData {

  // Apply message option for author
  option (author) = "Andrea";

  // Apply field option for temperature unit
  repeated double temperatures = 1 [(temperature_unit) = "Celsius"];

  message Sensor {
    string id = 1;
    string location = 2;
  }

  Sensor sensor = 2;
}

where sas-annotation.proto is defiend as follow

Protocol Buffer
syntax = "proto3";

import "google/protobuf/descriptor.proto";

// Define custom options for message annotations
extend google.protobuf.MessageOptions {
  string author = 50001;  // Custom option for message author
}

// Define custom options for field annotations
extend google.protobuf.FieldOptions {
  string temperature_unit = 50002;  // Custom option for field temperature unit
}

XSD⚓︎

In XML Schema (XSD), annotations are typically added using the element as shown in the following example.

XML
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:annotation>
    <xs:documentation>author:Andrea</xs:documentation>
  </xs:annotation>

  <xs:element name="TemperatureData">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="temperatures" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="temperature" type="xs:decimal">
                <xs:annotation>
                  <xs:documentation>unit:celsius</xs:documentation>
                </xs:annotation>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
        <xs:element name="sensor">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="id" type="xs:string"/>
              <xs:element name="location" type="xs:string"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

Keywords⚓︎

A keyword is a string to whitch is possible to associate a value to annotate a schema element. In the following example customKeyword is a keyword associated to the root object of the JSON Schema whose value is equal to Custom Keyword Value.

JSON
{
  "type": "object",

  "customKeyword": "Custom Keyword Value",

  "properties": {
    "username": {"type": "string"},
  "password": {"type": "string"}
  }
}

While the SAS does not mandate a specific format for a keyword, it is RECOMMENDED that the keyword be defined using an alphanumeric string. Special characters should only be included in exceptional cases and when absolutely necessary. This approach aims to minimize the risk of issues when using keywords in the context of specific DSLs, which may impose restrictions on keyword formats.

Defining a keyword and using it to annotate a schema is not enough. Each keyword must have a clear and precise definition to ensure its value can be effectively understood and used by potential consumers.

The definition of a keyword SHOULD include:

  • The keyword being defined (e.g., Creator)
  • The type of the keyword’s value (e.g., string, number, boolean, etc.)
  • A description of the keyword’s value (e.g., An entity responsible for creating the resource.)
  • Notes on best practices for setting the keyword's value (e.g., It is recommended to identify the creator using a URI. If this is not possible, a literal value that identifies the creator may be used.)
  • Any potential relationships with other keywords (e.g., Creator is a subproperty of Contributor).

Vocabularies⚓︎

A vocabulary is a collection of keyword definitions. Every vocabulary MUST have a unique URI that serves as its identifier, ensuring global distinctiveness and referenceability. It is RECOMMENDED to use a URL as the identifier, allowing potential consumers not only to identify the vocabulary but also to access it.

Each keyword in a vocabulary can be uniquely identified by appending it as a fragment identifier to the vocabulary's URI. For example, if the keyword customKeyword belongs to the vocabulary identified by the URI http://example.com/custom-vocabulary, then its unique identifier SHOULD be http://example.com/custom-vocabulary#customKeyword.

This makes the relationship between the vocabulary's URI and the keyword's identifier clearer.

To promote clarity, consistency and composability, keywords within the same vocabulary SHOULD be associated with a specific metadata class, such as physical, logical, conceptual, quality, or security metadata.

The actual structure and format of a vocabulary is not prescribed and remains open to the discretion of the defining authority.

The "sas" Keyword⚓︎

The sas keyword, placed in the root object of a schema, specifies the SAS version used.

The value of this keyword MUST be a string.

This annotation is REQUIRED in order to properly interpret the other keywords used to annotate the schema.

The "sasSchemaId" Keyword⚓︎

The sasSchemaId keyword, placed in the root object of a schema, specifies unique identifier of the annotated schema.

The value of this keyword MUST be a string.

This annotation is OPTIONAL.

The "sasDialect" Keyword⚓︎

The sasDialect keyword, placed in the root object of a schema, specifies the vocabularies in use. A dialect is a collection of vocabularies.

The value of this keyword MUST be a string or a SasDialect Object that is a JSON Object compliant with the following JSON Schema:

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "propertyNames": {
    "type": "string",
    "pattern": "^https?://.*$"
  },
  "additionalProperties": {
    "type": "object",
    "properties": {
      "prefix": {
        "type": "string"
      },
      "objectName": {
        "type": "string"
      },
      "excludedKeywords": {
         "type": "array"
      },
      "excludedKeywords": {
        "type": "array"
      },
      "aliases": {
        "type": "object"
      },

      "required": {
        "type": "boolean",
        "default": false
      }
    },
    "required": ["prefix"],
    "additionalProperties": false
  }
}

If the sasDialect keyword value is a string, it can either store the URI identifying the dialect definition or contain the SasDialect Object encoded as a string. Using a URI is advantageous as it eliminates the need to repeatedly include the same dialect definition in multiple schemas. Additionally, referencing an external definition via a URI or encoding the SasDialect Object as a string is particularly useful for annotating schemas defined in SDLs that do not support JSON syntax (e.g., Protobuf).

The SaSDialect Object includes a property for each vocabulary used in the dialect. Each property's name is the URI of the vocabulary, and its value is an object that specifies the following informations about the vocabulary:

Field Name Type Description
prefix string The string that prefixes each keyword in the vocabulary. Using a prefix can help prevent keyword collisions with other vocabularies.
objectName string The name of the parent object that contains all the keyword of this vocabulary. Grupping all annotation based on the keywords defined in a vocabulary withing a specific object can be usefull to avoid collision with keywords defined in other vocabularies or to isolate in one point all annoattion to mke easier distinguish them form core and functional keywords
excludedKeywords [string] A list of all keywords defined in the vocabulary but not used for annotations. The includedKeywords and excludedKeywords fields are complementary, and only one of them MUST be defined. If neither includedKeywords nor excludedKeywords is defined, all keywords in the vocabulary are used for annotations
includedKeywords [string] A list of keywords in use. Any keywords defined in the vocabulary but not included in this list are not used for annotations. The includedKeywords and excludedKeywords fields are complementary, and only one of them MUST be defined. If neither includedKeywords nor excludedKeywords is defined, all keywords in the vocabulary are used for annotations
aliases object It's an object used to map keywords defined in the vocabulary with aliases used in the schema to refer them
required boolean If true the consumers that do not recognize the vocabulary MUST refuse to process the schemas

This annotation is OPTIONAL. If omitted, the default SAS Dialect is assumed.

Example-1: A dialect based on two vocabulaties

JSON
    "sasDialect": {
      "https://www.dublincore.org/specifications/dublin-core/dcmi-terms": {"prefix":"dct.", "required":false},
      "https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#quality": {"prefix":"bitol.","required":false}
    },

Example-2: Groupping all annotations withing an object

JSON
{
  "sasDialect": {
    "https://www.dublincore.org/specifications/dublin-core/dcmi-terms": {
      "objectName": "annotations", 
      "prefix":"dct.", 
      "required":false
    },
    "https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#quality": {
      "objectName": "annotations", 
      "prefix":"bitol.",
      "required":false
    }
  }
}

Example-3: Including and excluding specific keywords

JSON
{
  "sasDialect": {
    "https://www.dublincore.org/specifications/dublin-core/dcmi-terms": {
      "prefix":"dct.", 
      "includedKeywords": [],
      "required":false
    },
    "https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#quality": {
      "prefix":"bitol.",
      "escludedKeywords": [],
      "required":false
    }
  }
}

Example-4: Defining aliases

JSON
{
  "sasDialect": {
    "https://www.dublincore.org/specifications/dublin-core/dcmi-terms": {
      "prefix":"dct.", 
      "escludedKeywords": ["Title"],
      "required":false
    },
    "https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#schema": {
      "prefix":"bitol.",
      "aliases": {"logicalName": ["title"]},
      "required":false
    }
  }
}

Example-5: Avro Schema

JSON
{
  "name": "EmailMessage",
  "type": "record",

  "sasDialects": {
    "https://www.dublincore.org/specifications/dublin-core/dcmi-terms": {"required":false}
  },

  "Title": "email",
  "Description": "This schema describe the basic structure of an email",

  "fields": [
    {
      "name": "subject",
    "type": "string"
    },
    {
      "name": "message",
      "type": "string"
    }
  ]
}

Example-6: Protobuf

Protocol Buffer
syntax = "proto3";

// Import custom options from sas-annotations.proto
import "sas-annotations.proto";  

message TemperatureData {

  // Apply message option for author
  option (sasDialects) = "{\"https://www.dublincore.org/specifications/dublin-core/dcmi-terms\": {\"required\":false}};";

  option (Title) = "email";
  option (Description) = "This schema describe the basic structure of an email";

  message Email {
    string subject = 1;
    string message = 2;
  }
}

where sas-annotation.proto is defiend as follow

Protocol Buffer
syntax = "proto3";

import "google/protobuf/descriptor.proto";

// Define custom options for message annotations
extend google.protobuf.MessageOptions {
  string sasDialects = 50001;  

   // Dublin Core Term Vocabulary
   string Title = 50002;  
   string Description = 50003;  
}

Default SAS Dialect⚓︎

If sasDialect is absent, the consumer MAY assume that all annotation keywords are defined within one of the vocabularies associated with the default SAS Dialect, defined as follows:

JSON
{
  "sasDialect": {
    "https://json-schema.org/draft/2020-12/vocab/meta-data": {"required":false},
    "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/meta-data-logical": {"required":false},
    "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/meta-data-physical": {"required":false},
      "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/constraints": {"required":false},
      "https://json-schema.org/draft/2020-12/vocab/content": {"required":false},
      "https://json-schema.org/draft/2020-12/vocab/format-annotation": {"required":false},
      "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/constraints": {"required":false},
      "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/context-syntactic": {"required":false},
      "https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/context-syntactic": {"required":false},
    "https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#quality": {"required":false},
  }
}

The vocabularies included in the default SAS Dialect are described in the following sections.

Basic Meta-Data Annotations Vacabulary⚓︎

These general-purpose annotation vocabulary defided in JSON Schema Validation extension provide commonly used information for documentation and user interface display purposes. They are not intended to form a comprehensive set of features. Rather, additional vocabularies can be defined for more complex annotation-based applications.

The current URI for this vocabulary, known as the Meta-Data Vocabulary, is: https://json-schema.org/draft/2020-12/vocab/meta-data.

The current URI for the corresponding meta-schema is: https://json-schema.org/draft/2020-12/meta/meta-data.

Hereafter are reported all keywords that compose this vocabulary with a short description for convenience. However the normative definition is contained in the JSON Schema Validation extension.

Keywords applicable to schema or properties⚓︎

title⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be a string.

The title keyword specifies the human-readable name of the element that can be used by frontend tools. It SHOULD be a short text.

This keyword is equivalent to:

description⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be a string.

The description keyword specifies the description of the element. CommonMark syntax MAY be used for rich text representation.

This keyword is equivalent to:

default⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

There are no restrictions placed on the value of this keyword.

The default keyword specifies the value that MAY be used for the element when it is not explicitly defined.

deprecated⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be a boolean.

When set to true, the deprecated keyword specifies that consumers SHOULD avoid using the declared property, as it may be removed in the future.

A schema containing the deprecated keyword set to true indicates that the entire resource being described MAY be removed in the future.

Omitting this keyword has the same behavior as a value of false.

Keywords applicable only to properties⚓︎

writeOnly⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be a boolean.

When set to true, the readOnly keyword indicates that the value of the element is managed exclusively by the owner of the underlyng datastore, and attempts by an application to modify the value of this element are expected to be ignored or rejected by that owner.

An schema element that is marked as readOnly MAY be ignored if sent to the owning authority, or MAY result in an error, at the datastore owner's discretion.

Example-1:

For example, readOnly MAY be used to mark a database-generated serial number as read-only.

JSON
{
  "properties": {
    "userId": {
      "type": "string",
      "readOnly": "true"
    }, "username": {
      "type": "string"
    }
  }
}

writeOnly⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be a boolean.

When set to true, the writeOnly keyword indicates that the value of the element is never present when data is retrieved from the source datastore. It can be present when sent to the owner of the underlying datastore to create a new item or update an existing one.

An schema element that is marked as writeOnly MAY be returned as a blank document of some sort, or MAY produce an error upon retrieval, or have the retrieval request ignored, at the datastore owner's discretion.

Example-1:

For example, writeOnly MAY be used to mark a property whose value is a password.

JSON
{
  "properties": {
    "username": {
      "type": "string"
    }, "username": {
      "password": "string",
      "writeOnly": true
    }
  }
}

Logical Meta-Data Annotations Vacabulary⚓︎

The Logical Meta-Data Annotations Vacabulary, defined as an extension of the SAS, provides commonly used metadata for annotating schemas at a logical level. The keywords in this vocabulary do not include, and will not include in the future, any information about the underlying datastore or details for schema or data validation.

The current URI for this vocabulary, known as the Logical Meta-Data Annotations Vacabulary, is: https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/meta-data-logical.

The current URI for the corresponding meta-schema is: https://dpds.opendatamesh.org/specifications/sas/1.0.0/meta/meta-data-logical.

Keywords applicable to schema or properties⚓︎

summary⚓︎

The value of this keyword MUST be a string.

The summary keyword specifies a short human readable description of the element. It SHOULD be used by frontend tools to visualize the item description in lists or tooltips where there is not enough space for using the full description.

Example-1:

JSON
{
  "title": "Leg",
  "summary": "The association between Trasport Units and Trips",
  "description": "A **Leg** is the association between a `Transport Unit` and a specific `Trip`. It represents how individual `Transport Units` are moved during a particular segment of their journey. Multiple  `Leg` can exist for a `Transport Unit` if it is moved across different `Trips` before reaching its final destination. Similarly, a single `Trip` can involve `Transport Units` from multiple `Transport Orders`.",
}

modelRole⚓︎

The value of this keyword MUST be a string

The modellingStyle keyword specifies the role of a schema element within its specific modelling style, helping to clarify its function in the context of the chosen model. For example, if the modellingStyle is set to starSchema, the schema might define an entity with a modelRole of either fact or dimension. In this case, the properties of the schema can have a modelRole of either attribute or measure. On the other hand, if the modellingStyle is set to rawDataVault, the schema might define an entity with a modelRole of hub, satellite, or link. This distinction helps clarify the purpose of each element within the overall model.

tags⚓︎

The value of this keyword MUST be an array of string

The tags keyword specifies a list of tags associated to the element.

This keyword is equivalent to:

externalDocs⚓︎

The value of this keyword MUST be an array of object

The externalDocs keyword specifies a list of additional documentation for the given element. Each item in the list is a pointer to a specific documentation source described as follow:

Field Name Type Description
description string A description of the target resource. CommonMark syntax MAY be used for rich text representation.
mediaType string The media type of target resource. It must conform to media type format, according to RFC6838.
$href string:uri REQUIRED. The URI of the target resource. It must conform to the URI format, according to RFC3986.

Example-1:

JSON
"externalDocs": [{
  "description": "Find more info here",
  "mediaType": "text/html",
  "$href": "https://example.com"
}]

This keyword can be maped to: - bitol.schema.primaryKey

Keywords applicable only to schema⚓︎

owner⚓︎

The value of this keyword MUST be a string.

The owner keyword specifies the identifier of the subject who owns the schema. It SHOULD be a person or a team. If the schema is not shared, it MUST be equal to the owner of the dataset upon which the schema is defined.

This keyword is equivalent to:

domain⚓︎

The value of this keyword MUST be a string.

The domain keyword specifies the domain to which the dataset described by the schema belongs. If the schema is not shared, it MUST be equal to the domain of the dataset upon which the schema is defined.

This keyword is equivalent to:

schemaType⚓︎

The value of this keyword MUST be a string.

The schemaType keyword specifies the structure of the data described by the schema. It indicates whether the data is organized in a tabular format, typical of relational databases (e.g., tables with rows and columns), or in a nested document format, common in document-oriented databases, streaing platforms and RESTFUL API (e.g., JSON or XML with hierarchical relationships). The possible values for this property are:

value description
tabular When the schema doesn't contain properties of type object (i.e. the schema describe a tabular document)
document When the schema contains properties of type object (i.e. the schema describe a neasted document)

Example-1

JSON
{
    "title": "transportOrder",
  "schemaType": "tabular",
    "type": "object",
    "properties": {
      "orderId": {"type": "integer"},
      "customerName": {"type": "string"},
      "orderDate": {"type": "string"},
      "deliveryDate": {"type": "string"},
      "destination": { "type": "string"},
      "orderStatus": {"type": "string" }
    }
  }

Example-2

JSON
{
    "title": "transportOrderDetail",
  "schemaType": "document",
    "type": "object",
    "properties": {
      "orderId": {"type": "integer"},
      "customerName": {"type": "string"},
      "orderDate": {"type": "string"},
      "deliveryDate": {"type": "string"},
      "destination": { "type": "string"},
      "orderStatus": {"type": "string" },
    "transportUnits": {
      "type": "array",
          "description": "List of transport units associated with the transport order.",
          "items": {
            "type": "object",
            "properties": {
                "unitId": {"type": "integer"},
                "unitDescription": {"type": "string"}
            }
        }
      }
  }
}

modellingStyle⚓︎

The value of this keyword MUST be a string.

The modellingStyle keyword specifies the data modelling approach or framework within which the structure of the entity described by this schema is defined. This could include frameworks like starSchema, rawDataVault, or unifiedStarSchema, each with its own set of rules for how data is organized, related, and queried. Understanding the modelling style is useful for better contextualizing the schema's structure and gaining a clearer understanding of the roles and relationships of other elements defined within it (see modelRole keyword).

contactPoints⚓︎

The value of this keyword MUST be an array of object.

The contactPoints keyword specifies a list of contact information for the given schema. Each item in the list is a valid contact point described as follow:

Field Name Type Description
name string:name The name of the contact point.
description string The contact point description. CommonMark syntax MAY be used for rich text representation.
channel string The channel used to address the contact point. It can be for example equal to web, mail, or phone.
address string The address of the contact point. Depending on the channel it can be for example a URL, an email address, or a phone number.

Example-1

JSON
{
  "contactPoints": [{
    "name": "Support Team Mail",
      "description": "The mail address of the team that gives support on this product",
      "channel": "email",
      "address": "trip-execution-support@company-xyz.com"
  }, {
      "name": "Issue Tracker",
      "description": "The address of the issue tracker associated with this product",
      "channel": "web",
      "address": "https://readmine.company-xyz.com/trip-execution"
    }
  ]
}

This keyword cona be mapped to: - bitol.support-and-cominucation-channels

status⚓︎

The value of this keyword MUST be a string

The status keyword specifies the state of the schema, which MAY be development, test, or production, depending on how the schema's lifecycle is defined.

This keyword is equivalent to:

Keywords applicable only to properties⚓︎

primaryKey⚓︎

The value of this keyword MUST be a boolean.

When set to true, the primaryKey keyword indicates that the property is part of the dataset's primary key.

The default value is false.

This keyword is equivalent to:

primaryKeyPosition⚓︎

The value of this keyword MUST be a integer.

When the property is part of the primary key, the primaryKeyPosition keyword specifies the position of the property within th primary key. Starts from 1.

The default value is -1.

Example-1: Given the table TRIP with a composite primary key defined as follow PRIMARY KEY (UNIT_ID, TRIP_ID, LEG_SEQUENCE) the annotation of the properties in the schema will be as follow:

JSON
  {
    "properties": {
      "UNIT_ID": {
        "type": "string",
        "primaryKey": true,
        "primaryKeyPosition": 1
      },
      "TRIP_ID": {
        "type": "string",
        "primaryKey": true,
        "primaryKeyPosition": 2
      },
      "LEG_SEQUENCE": {
        "type": "integer",
        "primaryKey": true,
        "primaryKeyPosition": 3
      }
    }
  }

This keyword is equivalent to:

unique⚓︎

The value of this keyword MUST be a boolean.

The unique keyword specifies whether two distinct entries in the dataset can have the same value for this property.

The default value is false.

Example-1:

JSON
{
  "title": "TransportOrder",

 "type": "object",
  "properties": {
    "orderId": { "type": "integer", "unique": true},
    "customerName": { "type": "string"}
  },
}

This keyword is equivalent to:

nullable⚓︎

The value of this keyword MUST be a boolean.

The nullable keyword specifies if the property value can be null. The value of a property is considered null if the property is not defined or its value is: - undefined - contained in the list of values specified by the nullValues keyword - matched by the regular expression specified by the nullValuePattern keyword

The default value is true.

Example-1:

JSON
{
  "title": "TransportOrder",

 "type": "object",
  "properties": {
    "orderId": { "type": "integer"},
    "customerName": {"type": "string", "nullable": false},
    "orderDate": {"type": "string"},
    "deliveryDate": {"type": "string"},
    "destination": {"type": "string"},
    "orderStatus": {"type": "string"}
  },
}

This keyword is the opposite of: - bitol.schema.required

nullValuesEnum⚓︎

The value of this keyword MUST be an array whose item type is equal to the type of the property.

The nullValuesEnum keyword specifies a list of values for the property that can be considere as null.

The default value is [].

Example-1:

JSON
{
  "title": "TransportOrder",

 "type": "object",
  "properties": {
    "orderId": { "type": "integer"},
    "customerName": {
        "type": "string", 
        "nullable": false,
        "nullValuesEnum": ["UNKNOWN", "TBD", "NA"]
    }
  },
}

nullValuesPattern⚓︎

The value of this keyword MUST be a string. This string SHOULD be a valid regular expression, according to the ECMA-262 regular expression dialect.

The nullValuesPattern keyword specifies the regular expression matched by property null values

The default value is ``.

Example-1:

JSON
{
  "title": "TransportOrder",

  "type": "object",
  "properties": {
    "orderId": { "type": "integer"},
    "customerName": {
      "type": "string", 
      "nullable": false,
      "nullValuesPattern": "^\\s*(UNKNOWN|TBD|NA)?\\s*$"
    }
  }
}

validityTime⚓︎

The value of this keyword MUST be a boolean.

When set to true, the validityTime keyword indicates that this property value represents when a fact is true in the real world. For example, a dataset's entry showing the employment status of an employee might have a valid time indicating when that employment status was valid in reality.

The default value is false.

creationTime⚓︎

The value of this keyword MUST be a boolean.

When set to true, the creationTime keyword indicates that this property value represents when a fact is recorded in the system. It's the transaction time of the entry's creation.

The default value is false.

lastUpdateTime⚓︎

The value of this keyword MUST be a boolean.

When set to true, the lastUpdateTime keyword indicates that this property value represents when a fact is recorded or updated in the system. It's the transaction time of the entry's last update.

The default value is false.

deletionTime⚓︎

The value of this keyword MUST be a boolean.

When set to true, the deletionTime keyword indicates that this property value represents when a fact is soft deletaed from the system. It's the transaction time of the entry's delation.

The default value is false.

sequenceKey⚓︎

The value of this keyword MUST be a boolean.

When set to true, the sequenceKey keyword indicates that this property can be used to order the dataset's entries from the oldest to the most recent, based on creation transaction time. The property is updated with each entry's change and is typically a timestamp or an incremental key.

The default value is true if keywords creationTime set to true for this property, false otherwise.

watermarkKey⚓︎

The value of this keyword MUST be a boolean.

When set to true, the watermarkKey keyword indicates that this property can be used to order the dataset's entries from the oldest to the most recent, based on the last update transaction time. The property is updated with each entry's change and is typically a timestamp or an incremental key.

The default value is true if keywords lastUpdateTime set to true for this property, false otherwise.

Physical Meta-Data Annotations Vacabulary⚓︎

The Physical Metadata Annotation Vocabulary, defined as an extension of the SAS, provides commonly used metadata for annotating schemas at a physical level. Sometime is necessary to enable the consumption of data from the underlying datastore to expose these information. If this is not the case this information SHOULD not be exposed in order to hide to the consumer internal implementation details.

The current URI for this vocabulary, known as the Physical Metadata Annotation Vocabulary, is: https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/meta-data-physical.

The current URI for the corresponding meta-schema is: https://dpds.opendatamesh.org/specifications/sas/1.0.0/meta/meta-data-physical

Keywords applicable to schema or properties⚓︎

physicalName⚓︎

The value of this keyword MUST be a string.

The name of the element in the source datastore

Example-1:

JSON
  {
    "properties": {
      "physicalName": "TRIP",
      "tripId": {
        "type": "string",
        "physicalName": "TRIP_ID"
      }
    }
  }

This keyword is equivalent to:

physicalType⚓︎

The value of this keyword MUST be a string.

The type of the element in the source datastore. For schema and object properties MAY be equal to TABLE, VIEW, etc ... For other proprerty type except Object MAY be equal to VARCHAR, TINNYINT, etc...

Example-1:

JSON
  {
    "properties": {
      "physicalType": "TABLE",
      "tripId": {
        "type": "string",
        "physicalType": "INT"
      }
    }
  }
This keyword is equivalent to:

Keywords applicable only to properties⚓︎

partitionKey⚓︎

The value of this keyword MUST be a boolean.

When set to true, the partitionKey keyword indicates that the property is part of the dataset's partition key.

The default value is false.

This keyword is equivalent to:

partitionKeyPosition⚓︎

The value of this keyword MUST be a integer.

When the property is part of the partition key, the partitionKeyPosition keyword specifies the position of the property within th partitiony key. Starts from 1.

The default value is -1.

This keyword is equivalent to:

Constraint Annotations Vacabulary⚓︎

The Constraint Annotations Vacabulary, defined as an extension of the SAS, provides commonly used metadata for annotating schemas at a physical level. Sometime is necessary to enable the consumption of data from the underlying datastore to expose these information. If this is not the case this information SHOULD not be exposed in order to hide to the consumer internal implementation details.

The current URI for this vocabulary, known as the Constraint Annotations Vacabulary, is: https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/constraints.

The current URI for the corresponding meta-schema is: https://dpds.opendatamesh.org/specifications/sas/1.0.0/meta/constraint.

Keywords applicable to any properties⚓︎

enum⚓︎

The value of this keyword MUST be an array. This array SHOULD have at least one element. Elements in the array SHOULD be unique.

The enum keyword specifies the admissibe values for the property.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "weekDay": {
      "type": "string",
       "enum": [
          "Monday",
          "Tuesday",
          "Wednesday",
          "Thursday",
          "Friday",
          "Saturday",
          "Sunday"
        ],
    }
  }
}

This keyword is equivalent to - json-schema-validation.enum`

const⚓︎

The value of this keyword MAY be of any type.

The const keyword defines the only value the property is allowed to have. Use of this keyword is functionally equivalent to an enum with a single value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "pi": {
      "type": "number",
      "const": 3.14
    },
  }
}

This keyword is equivalent to - json-schema-validation.const

Keywords applicable to numeric properties⚓︎

multipleOf⚓︎

The value of this keyword MUST be a number, strictly greater than 0.

The multipleOf keyword specifies that the property value is a multiple of the keyword value. When the property value is diveded by the keyword value the expected result is an integer.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "randomEvenNumber": {
      "type": "number",
      "multipleOf": 2
    },
  }
}

This keyword is equivalent to:

maximum⚓︎

The value of this keyword MUST be a number.

The maximum keyword specifies an inclusive upper limit for the property value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "monthNumber": {
      "type": "integer",
      "maximum": 12
    },
  }
}

This keyword is equivalent to:

exclusiveMaximum⚓︎

The exclusiveMaximum keyword specifies an exclusive upper limit for the property value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "speed": {
      "type": "number",
      "exclusiveMaximum": 186282
    },
  }
}

This keyword is equivalent to:

minimum⚓︎

The value of this keyword MUST be a number.

The minimum keyword specifies an inclusive lower limit for the property value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "monthNumber": {
      "type": "integer",
      "minimum": 1
    },
  }
}

This keyword is equivalent to:

exclusiveMinimum⚓︎

The value of this keyword MUST be a number.

The exclusiveMinimum keyword specifies an exclusive lower limit for the property value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "age": {
      "type": "integer",
      "exclusiveMinimum": 0
    },
  }
}

This keyword is equivalent to:

Keywords applicable to string properties⚓︎

maxLength⚓︎

The value of this keyword MUST be a non-negative integer.

The maxLength keyword specifies the maximum number of characters (i.e. string length) that make up the property value, as defined by RFC 8259.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "email": {
      "type": "string",
      "maxLength": 254 
    },
  }
}

This keyword is equivalent to:

minLength⚓︎

The value of this keyword MUST be a non-negative integer.

The minLength keyword specifies the minimum number of characters (i.e. string length) that make up the property value, as defined by RFC 8259.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "email": {
      "type": "string",
      "minLength": 3 
    },
  }
}

This keyword is equivalent to:

pattern⚓︎

The value of this keyword MUST be a string. This string SHOULD be a valid regular expression, according to the ECMA-262 regular expression dialect.

The pattern keyword specifies the regular expression matched by the property value.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "email": {
      "type": "string",
      "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" 
    },
  }
}

This keyword is equivalent to:

Keywords applicable to array properties⚓︎

maxItems⚓︎

The value of this keyword UST be a non-negative integer.

The maxItems keyword specifies an inclusive upper limit for the array property items.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "spokenLanguages": {
      "type": "array",
      "maxItems": 10,
      "items": {"type": "string"}
    },
  }
}

This keyword is equivalent to:

minItems⚓︎

The value of this keyword MUST be a non-negative integer.

The minItems keyword specifies an inclusive lower limit for the array property items.

There is no default value.

Example-1:

JSON
{
  "properties": {
    "spokenLanguages": {
      "type": "array",
      "minItems": 1,
      "items": {"type": "string"}
    },
  }
}
This keyword is equivalent to:

Keywords applicable to object properties⚓︎

maxProperties⚓︎

The value of this keyword MUST be a non-negative number.

The maxProperties keyword sets an inclusive upper limit on the number of properties an object property can have.

There is no default value.

Example-1:

The following example specifies that no more than 3 environments can be defined within the given schema

JSON
{
  "type": "object",
  "patternProperties": {
    "^env-[a-zA-Z0-9]+$": {
      "type": "string",
      "description": "Connection string for the specified environment."
    }
  },
  "maxProperties": 3,
  "additionalProperties": false
}

This keyword is equivalent to:

minProperties⚓︎

The value of this keyword MUST be a non-negative number.

The minProperties keyword sets an inclusive lower limit on the number of properties an object property should have.

There is no default value.

Example-1:

The following example specifies that at least 1 environments should be defined within the given schema

JSON
{
  "type": "object",
  "patternProperties": {
    "^env-[a-zA-Z0-9]+$": {
      "type": "string",
      "description": "Connection string for the specified environment."
    }
  },
  "minProperties": 1,
  "additionalProperties": false
}

This keyword is equivalent to:

required⚓︎

The value of this keyword MUST be an array. Elements of this array, if any, MUST be strings, and MUST be unique.

The required keyword specifies the property that are required to exist in the object property.

There default is [].

Example-1:

JSON
{
  "type": "object",
  "properties": {
    "phone": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "email": {
      "type": "string",
      "format": "email"
    }
  },
  "required": ["email"]
}

This keyword is equivalent to:

dependentRequired⚓︎

The value of this keyword MUST be an object. Properties in this object, if any, MUST be arrays. Elements in each array, if any, MUST be strings, and MUST be unique.

The dependentRequired keyword specifies properties that are required if a specific other property is present. Their requirement is dependent on the presence of the other property.

There default is {}.

Example-1:

JSON
{
  "type": "object",
  "properties": {
    "hasPhone": {
      "type": "boolean"
    },
    "phone": {
      "type": "string",
      "pattern": "^[0-9]{10}$"
    },
    "email": {
      "type": "string",
      "format": "email"
    }
  },
  "dependentRequired": {
    "hasPhone": ["phone"]
  },
  "required": ["email"]
}

This keyword is equivalent to:

String-encoded Content Vacabulary⚓︎

This vocabulary defided in JSON Schema Validation extension provide keywords to describe the type of content of a string property, how it is encoded, and/or how it may be validated. They do not function as validation assertions.

The current URI for this vocabulary, known as the Structural Validation Vacabulary, is: https://json-schema.org/draft/2020-12/vocab/content.

The current URI for the corresponding meta-schema is: https://json-schema.org/draft/2020-12/meta/content.

Keywords applicable to string properties⚓︎

contentEncoding⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be an string.

The contentEncoding keyword indicates that the property value represents encoded binary data and should be decoded using the specified encoding.

Possible values indicating base 16, 32, and 64 encodings with several variations are listed in RFC4648. Additionally, sections 6.7 and 6.8 of RFC2045 provide encodings used in MIME. This keyword is derived from MIME's Content-Transfer-Encoding header, which was designed to map binary data into ASCII characters. It is not related to HTTP's Content-Encoding header, which is used to encode (e.g. compress or encrypt) the content of HTTP request and responses.

As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context. Note that all of these encodings result in strings consisting only of 7-bit ASCII characters. Therefore, this keyword has no meaning for strings containing characters outside of that range.

If this keyword is absent, but contentMediaType is present, this indicates that the encoding is the identity encoding, meaning that no transformation was needed in order to represent the content in a UTF-8 string.

Example-1: Using contentEncoding with Base64 Encoding This example demonstrates a property where the value is a binary file encoded in base64.

JSON
{
  "type": "object",
  "properties": {
    "profilePicture": {
      "type": "string",
      "contentEncoding": "base64",
      "description": "The profile picture encoded in base64 format."
    }
  },
  "required": ["profilePicture"]
}

Example-2: Using contentEncoding with with UTF-8 Encoding This example demonstrates a property where the value is a binary file encoded in base64.

JSON
{
  "type": "object",
  "properties": {
    "notes": {
      "type": "string",
      "contentEncoding": "utf-8",
      "description": "A text note encoded in UTF-8 format."
    }
  }
}

Example-3: Using contentEncoding with with GZIP Encoding This example demonstrates a property where the value is a binary file compressed using GZIP

JSON
{
  "type": "object",
  "properties": {
    "compressedData": {
      "type": "string",
      "contentEncoding": "gzip",
      "description": "The compressed data encoded using GZIP."
    }
  }
}

Example-4: Using contentEncoding with with GZIP Encoding This example demonstrates a case where no encoding is applied, meaning the data is in its raw form.

JSON
{
  "type": "object",
  "properties": {
    "rawData": {
      "type": "string",
      "contentEncoding": "identity",
      "description": "Raw data with no encoding applied."
    }
  }
}

contentMediaType⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be an string, which MUST be a media type, as defined by RFC2046.

The contentMediaType keyword indicates the media type of the string property value.

Example-1

JSON
{
  "type": "object",
  "properties": {
    "profilePicture": {
      "type": "string",
      "contentEncoding": "base64",
      "contentMediaType": "image/jpeg",
      "description": "A profile picture encoded in base64 and in JPEG format."
    },
    "resume": {
      "type": "string",
      "contentEncoding": "base64",
      "contentMediaType": "application/pdf",
      "description": "A resume encoded in base64 and in PDF format."
    },
    "notes": {
      "type": "string",
      "contentEncoding": "utf-8",
      "contentMediaType": "text/plain",
      "description": "Textual notes encoded in UTF-8 and in plain text format."
    }
  }
}

contentSchema⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be an string.

If contentMediaType is present, the contentSchema keywor contains a schema which describes the structure of the string. This keyword MAY be used with any media type that can be mapped into JSON Schema's data model.

The value of this property MUST be a valid JSON schema. It SHOULD be ignored if contentMediaType is not present.

Example-1 This example describes a JWT that is MACed using the HMAC SHA-256 algorithm, and requires the "iss" and "exp" fields in its claim set.

JSON
{
  "type": "object",
  "properties": {
    "jwt": {
        "type": "string",
        "contentMediaType": "application/jwt",
        "contentSchema": {
            "type": "array",
            "minItems": 2,
            "prefixItems": [{
            "const": {
                "typ": "JWT",
                "alg": "HS256"
            }
            },{
            "type": "object",
            "required": ["iss", "exp"],
            "properties": {
                "iss": {"type": "string"},
                "exp": {"type": "integer"}
            }
            }]
        }
    }
  }
}

This is a valid entry encoded as a JSON Document...

JSON
{
  "jwt": "[{\"typ\":\"JWT\",\"alg\":\"HS256\"},{\"iss\":\"exampleIssuer\",\"exp\":1710000000}]"
}

Format Annotation Vacabulary⚓︎

This vocabulary defided in JSON Schema Validation extension describe the format keyword defined to allow schema authors to convey syntactic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other external specifications

The current URI for this vocabulary, known as the Format-Annotation vocabulary, is: https://json-schema.org/draft/2020-12/vocab/format-annotation.

The current URI for the corresponding meta-schema is: https://json-schema.org/draft/2020-12/meta/format-annotation.

Keywords applicable to properties⚓︎

format⚓︎

This description is provided here for convenience, but it is not normative. The normative description is defined in the original vocabulary and can be found here

The value of this keyword MUST be an string.

JSON Schema provides the following format to enforce a more precise specification for properties of type string

type format Description Example
string date-time A full date and time with time zone as defined by date-time ABNF rule in RFC 3339, section 5.6. 2025-01-05T15:30:00+01:00
string date A full date as defined by as defined by date ABNF rule in RFC 3339, section 5.6. 2025-01-05
string time A full time with optional time zone as defined by as defined by full-time ABNF rule in RFC 3339, section 5.6. 15:30:00+01:00
string duration A time span as defined by duration ABNF rule in RFC 3339, section 5.6. P1Y2M3DT4H5M6S
string email An email address as defined by Mailbox ABNF rule in RFC 5321, section 4.1.2. user@example.com
string email An email address as defined by the extended Mailbox ABNF rule in RFC 6531, section 3.3. 用户@例子.公司
string hostname An hostname as defined by RFC 1123, section 2.1, including host names produced using the Punycode algorithm specified in RFC 5891, section 4.4. www.example-domain.com
string ipv4 An IPv4 address according to the dotted-quad ABNF syntax as defined in RFC 2673, section 3.2. 192.168.1.1
string ipv6 An IPv6 address as defined in RFC 4291, section 2.2. 2001:0db8:85a3:0000:0000:8a2e:0370:7334
string uri A valid URI as defined in RFC3986.. https://www.example.com/resource
string uri-reference A valid URI Reference (either a URI or a relative-reference) as defined in RFC3986.. /relative/path or https://example.com
string iri A valid IRI Reference (either a URI or a relative-reference) as defined in RFC3987.. https://example.com/path/to/resource?query=hello#section1
string uuid A valid string representation of a UUID as defined in RFC4122.. 123e4567-e89b-12d3-a456-426614174000
string uri-template A valid URI Template (of any level) as defined in RFC6570.. https://api.example.com/users/{userId}/posts{?limit,offset}
string json-pointer A valid JSON string representation of a JSON Pointer as defined in RFC6901, section 5.. /users/0/name
string json-pointer A valid Relative JSON Pointer. 1/users/0/name
string regex A valid regular expression according to the ECMA-262 regular expression dialect. ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This specification introduces the following additional formats for commonly used string values, so you don't have to repeatedly define a regular expression with the pattern keyword when annotating properties.

type format Description Example
string alphanumeric a string that match the following regex ^[a-zA-Z0-9]+$ abc123
string name a string that match the following regex ^[a-zA-Z][a-zA-Z0-9]+$ John123
string fqn a string that match the following regex ^[a-zA-Z][a-zA-Z0-9.:]+$ com.example:MyApp
string version a string that matches the following regex ^(0\|[1-9]\d*)\.(0\|[1-9]\d*)\.(0\|[1-9]\d*)(?:-((?:0\|[1-9]\d*\|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0\|[1-9]\d*\|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+\|(?:\.[0-9a-zA-Z-]+)*))?$ 3.2.1

This specification also defines the following additional formats for numeric values to provide guidance to the consumer on how to interpret and store them most efficiently.

type format Description Example
number int8 signed 8 bits (a.k.a byte) 127
number int16 signed 16 bits (a.k.a short) -32768
number int32 signed 32 bits (a.k.a integer) 2147483647
number int64 signed 64 bits (a.k.a long) 9223372036854775807
number float single-precision 32-bit IEEE 754 floating point 3.14
number double double-precision 64-bit IEEE 754 floating point 2.718281828459045

As suggested in the Format Annotation Vocabulary, we do not define additional formats, and other vocabularies SHOULD not do so either.

Vocabularies do not support specifically declaring different value sets for keywords. Due to this limitation, and the historically uneven implementation of this keyword, it is RECOMMENDED to define additional keywords in a custom vocabulary rather than additional format attributes if interoperability is desired.

To convey syntactic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other external specifications, the keyword conformsTo SHOULD be used in place of extending the value sets for the format keyword.

Syntactic Context Annotations Vacabulary⚓︎

The Syntactic Context Annotations Vacabulary, defined as an extension of the SAS, provides commonly used metadata to syntactly contextualize schema element.

The current URI for this vocabulary, known as the Syntactic Context Annotations Vacabulary, is: https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/context-syntactic.

The current URI for the corresponding meta-schema is: https://dpds.opendatamesh.org/specifications/sas/1.0.0/meta/context-syntactic.

Keywords applicable only to properties⚓︎

conformsTo⚓︎

The value of this keyword MUST be an string.

The conformsTo keyword specifies an established standard to which the described resource conforms.

This keyword is equivalent to:

Example-1

JSON
{
  "type": "object",
  "properties": {
    "coutryCode": {
      "type": "string",
      "maxLength": 2,
      "conformsTo": "ISO3166-2"
    }
  }
}

Semantic Context Annotations Vacabulary⚓︎

The Semantic Context Annotations Vacabulary, defined as an extension of the SAS, provides commonly used metadata to syntactly contextualize schema element.

The current URI for this vocabulary, known as the Semantic Context Annotations Vacabulary, is: https://dpds.opendatamesh.org/specifications/sas/1.0.0-DRAFT/vocab/context-semantic.

The current URI for the corresponding meta-schema is: https://dpds.opendatamesh.org/specifications/sas/1.0.0/meta/context-semantic.

Keywords applicable only to schema⚓︎

semanticContext⚓︎

The value of this keyword MUST be a string or a SemanticContext Object that is a JSON Object compliant with the following JSON Schema:

JSON
{}

🚧 WIP: see RFC-74 : Semantic Linking

Example-1:

JSON
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Simplified Movie Object (Compact)",
    "type": "object",

    "s-context": {
        "s-base": "https://schema.org",

        "s-type": "[Movie]",

        "movieId": null,
        "directorName": "director[Person].name",
        "directorCountryName": "director[Person].country[Country].name",
        "actors": "actor[Person].name",
        "copyright": {
            "s-type": "copyrightHolder[Organization]",
            "organizationId": null,
            "email": "contactPoint[ContactPoint].mail"
        }  
    },


    "properties": {
      "movieId": {
        "type": "string"
      },
      "name": {
        "type": "string"
      },
      "directorName": {
        "type": "string"
      },
      "directorCountryName": {
        "type": "string"
      },
      "actors": {
        "type": "array",
        "items": {
          "type": "string"
        },
        "minItems": 1
      },
      "copyright": {  
        "type": "object",
        "properties": {
            "organizationId": {
                "type": "string"
            },
            "legalName": {
                "type": "string"
            },
            "email": {
                "type": "string"
            }
        }
      }
    }
  }

Explanation:

  • s-context: defines the semantic links. It can be defined inline or as an external reference
  • s-base: the base URL used to resolve concept names
  • s-type: the linked concept name or full concept URI enclosed in square brackets. The name before the brackets is the name of the parent property valorized by the linked concept. For example "copyright": {"s-type": "copyrightHolder[Organization]"} maps the property copyright of the schema to the property copyrightHolder of the parent concept (Movie) and has as value an Organization concept. If the name used in the schema is already equal to the name of the referenced concept property it can be omitted in the mapping.
  • "movieId": null, the schema property movieId is not defined in the Movie concept. It exists only in the physical data
  • "directorName": "director[Person].name" : the property directorName in the schema is linked to the property name of the Person who directs the Movie
  • "directorCountryName": "director[Person].country[Country].name" : the property directorCountryName in the schema is linked to the property name of the Country of the Person who directs the Movie
  • "actors": "actor[Person].name : the property actors in the schema is linked to the property name of the Person who acts in the - Movie. Because actors in the schema is an array the values are the names of the actors.

Data Quality Annotation Vacabulary⚓︎

The Data Quality Annotation Vacabulary, defined as an extension of the ODCS, provides commonly used metadata to describe data quality rules & parameters.

The current URI for this vocabulary, known as the Semantic Context Annotations Vacabulary, is: https://bitol-io.github.io/open-data-contract-standard/v3.0.0/#data-quality.

For a comprehensive definition of all supported keywords, along with their syntax and semantics, please refer to the ODBC Specification available on the Bitol website.

Appendix A: Revision History⚓︎

Version Date Notes
1.0.0 TBD Release of the 1.0.0 version
1.0.0-DRAFT 2025-JANUARY Release of the 1.0.0-DRAFT version