Standards & Specifications¶
OpenMetadata Standards builds on established open standards and industry best practices.
JSON Schema Standards¶
JSON Schema Version¶
OpenMetadata schemas use JSON Schema Draft-07 as the primary version, with migration to Draft 2020-12 in progress.
Key features used:
$schema- Schema version declaration$ref- Schema composition and reuse$id- Schema identificationdefinitions- Reusable schema componentstype,properties,required- Basic validationpattern,minLength,maxLength- String validationminimum,maximum- Numeric validationformat- Semantic validation (email, uri, date-time)
Example:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://open-metadata.org/schema/entity/data/table.json",
"title": "Table",
"description": "Schema for database table entity",
"type": "object",
"properties": {
"id": {
"$ref": "../../type/basic.json#/definitions/uuid"
}
},
"required": ["id", "name"]
}
Schema Organization¶
Schemas are organized hierarchically:
- Entities (
entity/): Complete object definitions - Types (
type/): Reusable type definitions - Common patterns: Shared across schemas via
$ref
Validation Rules¶
All schemas include comprehensive validation:
- Required fields: Essential properties
- Data types: Strict type checking
- Constraints: Length, range, pattern matching
- Formats: Email, URI, UUID, timestamp
- Enumerations: Fixed value sets
RDF/OWL Standards¶
RDF 1.1¶
OpenMetadata ontologies conform to W3C RDF 1.1:
- Triple model: Subject-Predicate-Object
- URIs: Persistent identifiers for resources
- Literals: Data values with types
- Blank nodes: Anonymous resources
Serialization formats:
- Turtle (
.ttl) - Primary format (human-readable) - RDF/XML - XML serialization
- N-Triples - Line-based format
- JSON-LD - JSON-based RDF
OWL 2¶
The OpenMetadata Ontology uses OWL 2 features:
- Classes: Entity type definitions
- Properties: Relationships and attributes
- Object Properties: Entity-to-entity relationships
- Datatype Properties: Entity-to-value relationships
- Class hierarchies:
rdfs:subClassOf - Property hierarchies:
rdfs:subPropertyOf - Domain and Range: Property constraints
- Inverse Properties: Bidirectional relationships
Example:
om:Table a owl:Class ;
rdfs:subClassOf om:DataAsset ;
rdfs:label "Table" ;
rdfs:comment "Database table or view" .
om:contains a owl:ObjectProperty ;
rdfs:domain om:Database ;
rdfs:range om:Table ;
owl:inverseOf om:belongsTo .
RDFS¶
Uses RDF Schema for:
rdfs:Class- Class definitionsrdfs:Property- Property definitionsrdfs:subClassOf- Class hierarchiesrdfs:subPropertyOf- Property hierarchiesrdfs:domain- Property subjectsrdfs:range- Property objectsrdfs:label- Human-readable namesrdfs:comment- Documentation
SHACL Standards¶
SHACL Core¶
OpenMetadata shapes use W3C SHACL:
Node Shapes:
om:TableShape a sh:NodeShape ;
sh:targetClass om:Table ;
sh:property [
sh:path om:name ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:datatype xsd:string ;
sh:minLength 1 ;
sh:maxLength 256 ;
] .
Property Shapes:
sh:minCount,sh:maxCount- Cardinalitysh:datatype- Data type constraintssh:class- Object type constraintssh:pattern- Regular expressionssh:minLength,sh:maxLength- String lengthsh:minInclusive,sh:maxInclusive- Numeric ranges
Validation:
sh:severity- Violation, Warning, Infosh:message- Custom error messages- Automatic validation reports
JSON-LD Standards¶
JSON-LD 1.1¶
JSON-LD contexts conform to JSON-LD 1.1:
Context features:
- @context: Context definitions
- @type: Type coercion
- @id: URI mapping
- @vocab: Default vocabulary
- @base: Base URI
- @language: Language tags
Example context:
{
"@context": {
"@vocab": "http://open-metadata.org/ontology#",
"name": "http://schema.org/name",
"owner": {
"@id": "http://open-metadata.org/ontology#owner",
"@type": "@id"
},
"tags": {
"@id": "http://open-metadata.org/ontology#tag",
"@container": "@set"
}
}
}
Semantic Mapping¶
JSON-LD enables mapping between JSON and RDF:
{
"@context": "https://open-metadata.org/contexts/dataAsset.jsonld",
"@id": "https://myorg.com/tables/customers",
"@type": "Table",
"name": "customers",
"owner": "https://myorg.com/users/john"
}
Converts to RDF:
<https://myorg.com/tables/customers>
a om:Table ;
om:name "customers" ;
om:owner <https://myorg.com/users/john> .
PROV-O Standards¶
W3C PROV-O¶
Lineage tracking uses PROV-O:
Core classes:
prov:Entity- Data and artifactsprov:Activity- Processes and actionsprov:Agent- People, systems, organizations
Core properties:
prov:wasDerivedFrom- Entity derivationprov:wasGeneratedBy- Activity generationprov:used- Activity usageprov:wasAssociatedWith- Agent associationprov:wasAttributedTo- Entity attribution
Example:
:customers_table a prov:Entity, om:Table ;
prov:wasDerivedFrom :raw_data ;
prov:wasGeneratedBy :etl_pipeline ;
prov:wasAttributedTo :data_team .
:etl_pipeline a prov:Activity, om:Pipeline ;
prov:used :raw_data ;
prov:startedAtTime "2024-01-15T10:00:00Z"^^xsd:dateTime .
Industry Standards¶
OpenAPI¶
API schemas can be converted to OpenAPI specifications for REST API documentation.
DCAT¶
Data Catalog Vocabulary (DCAT) mapping:
dcat:Dataset→om:Tabledcat:Distribution→ File formatsdcat:DataService→om:Service
Dublin Core¶
Dublin Core metadata elements:
dc:title→om:displayNamedc:description→om:descriptiondc:creator→om:ownerdc:created→om:createdAt
Schema.org¶
Schema.org vocabulary integration:
schema:name→om:nameschema:description→om:descriptionschema:author→om:owner
Compliance & Regulations¶
GDPR Compliance¶
Support for GDPR requirements:
- Data inventory: Catalog all personal data
- Purpose tracking: Record processing purposes
- Consent tracking: Link to consent records
- Right to erasure: Soft delete with restoration
- Data lineage: Track data flow
- Access controls: Policy-based access
Tags for GDPR:
{
"classification": "GDPR",
"tags": [
"GDPR.PersonalData",
"GDPR.SensitiveData",
"GDPR.ConsentRequired"
]
}
CCPA Compliance¶
California Consumer Privacy Act support:
- Personal information inventory
- Data selling tracking
- Consumer rights management
- Opt-out tracking
HIPAA Compliance¶
Healthcare data protection:
- PHI (Protected Health Information) tagging
- Access audit trails
- Encryption metadata
- Minimum necessary tracking
SOC 2 Compliance¶
Service Organization Control compliance:
- Access control documentation
- Change tracking
- Audit logging
- Data classification
Best Practices¶
Schema Versioning¶
- Semantic versioning (MAJOR.MINOR.PATCH)
- Backward compatibility maintenance
- Deprecation notices
- Migration guides
URI Design¶
- Persistent URIs
- Dereferenceable URIs
- Content negotiation
- HTTP URI scheme
Validation Strategy¶
- JSON Schema validation for structure
- SHACL validation for semantics
- Custom business rules
- Continuous validation
Interoperability¶
- Standard vocabularies where possible
- Clear namespace management
- Documented extensions
- Example data and mappings
Version History¶
Current Version: 1.0.0¶
- Initial release of schemas
- Complete entity coverage
- RDF ontology and SHACL shapes
- JSON-LD contexts
Roadmap¶
- JSON Schema 2020-12 migration
- Enhanced SHACL constraints
- Additional JSON-LD contexts
- DCAT alignment
- Schema.org mappings