Skip to content

OpenMetadata StandardsΒΆ

A comprehensive metadata standard for the modern data and AI ecosystem

What Are We Modeling?ΒΆ

OpenMetadata Standards provide a unified, open-source metadata model that describes every aspect of your data and AI ecosystem - from traditional data assets to modern AI systems, covering both structured and unstructured data across your entire organization.

Comprehensive CoverageΒΆ

Traditional Data Assets: - Databases, tables, schemas, and stored procedures - Data pipelines, workflows, and DAGs - Dashboards, reports, and visualizations - Message queues, topics, and event streams - APIs, endpoints, and service contracts

Unstructured Data & Documents: - Drive services (Google Drive, OneDrive, SharePoint) - Spreadsheets, worksheets, and collaborative documents - File systems, containers, and object storage - Directories, files, and document repositories

AI Governance & LLM Systems: - Large Language Models (LLMs) and foundation models - AI Agents and autonomous systems - Model Context Protocol (MCP) servers and tools - Prompts, templates, and prompt engineering - Vector databases and embeddings - AI applications and integrations

Data Governance & Quality: - Data quality tests, suites, and profiles - Classification, tags, and glossaries - Data contracts and SLAs - Lineage from source to consumption - Teams, users, roles, and ownership - Domains and data products

AI Governance Initiative

OpenMetadata is pioneering AI Governance by extending metadata standards to cover the entire AI lifecycle - from LLMs and agents to prompts and vector databases. This enables organizations to govern AI systems with the same rigor as traditional data assets.

Learn more: AI Governance Roadmap

What This EnablesΒΆ

  • Universal Interoperability


    Seamlessly connect and integrate across data platforms, document systems, and AI tools using standardized metadata schemas.

  • Semantic Understanding


    Enable rich semantic queries and reasoning through RDF ontologies and knowledge graphs built on W3C standards.

  • AI Governance


    Govern AI systems with the same rigor as data - track LLMs, agents, prompts, and model lineage end-to-end.

  • Unified Data Governance


    Apply consistent governance policies across structured databases, unstructured documents, and AI systems.

  • Data Quality


    Comprehensive testing, profiling, and validation frameworks ensuring data reliability across all asset types.

  • Complete Lineage


    Track data flow from raw sources through transformations, ML pipelines, to AI applications and dashboards.

  • Clear Ownership


    Define organizational structure, teams, roles, and responsibilities across all data and AI assets.

  • API-First Design


    RESTful APIs enable real-time metadata updates and integrations without heavyweight infrastructure.


The Metadata StackΒΆ

OpenMetadata Standards are expressed in multiple complementary formats:

πŸ“‹ JSON SchemaΒΆ

Human-readable, machine-validatable schemas

  • JSON Schema Draft-07 specification
  • 700+ schemas covering all metadata entities
  • Strongly typed with validation rules
  • IDE autocomplete support
  • Used by OpenMetadata APIs

Explore JSON Schemas β†’


πŸ”— RDF & OWL OntologyΒΆ

Semantic web standards for knowledge graphs

  • W3C OWL ontology for formal semantics
  • RDFS classes and properties
  • Reasoning and inference capabilities
  • SPARQL queryable
  • Integration with semantic web tools

Explore RDF Ontology β†’


🌐 JSON-LD Contexts¢

Linked data for interoperability

  • JSON-LD 1.1 contexts
  • Maps JSON to RDF
  • Enables semantic annotations
  • Web-scale data integration
  • Compatible with schema.org

Explore JSON-LD β†’


βœ… SHACL ShapesΒΆ

Validation constraints for RDF graphs

  • SHACL shapes for validation
  • Constraint checking
  • Data quality rules
  • Graph validation
  • Compliance verification

Explore SHACL β†’


The Hierarchical ModelΒΆ

OpenMetadata organizes entities in hierarchical service-based structures:

Database StackΒΆ

graph TD
    DS[Database Service<br/>MySQL, PostgreSQL, Snowflake] --> DB[Database]
    DB --> SCHEMA[Schema]
    SCHEMA --> TABLE[Table]
    SCHEMA --> SP[Stored Procedure]
    TABLE --> COL[Column]

    style DS fill:#667eea,color:#fff
    style DB fill:#4facfe,color:#fff
    style SCHEMA fill:#00f2fe,color:#333
    style TABLE fill:#43e97b,color:#333
    style SP fill:#43e97b,color:#333
    style COL fill:#e0f2fe,color:#333

Pipeline StackΒΆ

graph TD
    PS[Pipeline Service<br/>Airflow, Dagster, Prefect, dbt] --> P[Pipeline]
    P --> T[Task]

    style PS fill:#667eea,color:#fff
    style P fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style T fill:#00f2fe,color:#333

Messaging StackΒΆ

graph TD
    MS[Messaging Service<br/>Kafka, Pulsar, Kinesis] --> TOP[Topic]
    TOP --> SCH[Message Schema]

    style MS fill:#667eea,color:#fff
    style TOP fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style SCH fill:#00f2fe,color:#333

Dashboard StackΒΆ

graph TD
    DBS[Dashboard Service<br/>Tableau, Looker, PowerBI] --> DM[Data Model]
    DBS --> DASH[Dashboard]
    DBS --> CH[Chart]

    style DBS fill:#667eea,color:#fff
    style DM fill:#4facfe,color:#fff
    style DASH fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style CH fill:#00f2fe,color:#333

ML StackΒΆ

graph TD
    MLS[ML Model Service<br/>MLflow, SageMaker] --> ML[ML Model]
    ML --> F[Features]
    ML --> H[Hyperparameters]
    ML --> M[Metrics]

    style MLS fill:#667eea,color:#fff
    style ML fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style F fill:#f093fb,color:#333
    style H fill:#f093fb,color:#333
    style M fill:#f093fb,color:#333

Storage StackΒΆ

graph TD
    SS[Storage Service<br/>S3, GCS, Azure Blob] --> C[Container]
    C --> F[Files]

    style SS fill:#667eea,color:#fff
    style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style F fill:#00f2fe,color:#333

Explore All Data Assets β†’


Cross-Cutting ConceptsΒΆ

Beyond data assets, OpenMetadata Standards model:

πŸ”„ LineageΒΆ

Complete data flow tracking

Track transformations from source to dashboard to ML model using: - Column-level lineage - Asset-level lineage - W3C PROV-O provenance ontology - Pipeline execution lineage

Example: API Service β†’ ETL Pipeline β†’ Table β†’ Dashboard

Explore Lineage Specification β†’


πŸ“š GovernanceΒΆ

Business context and classification

Model business knowledge and data sensitivity: - Glossaries: Business terminology - Glossary Terms: Definitions with relationships - Classifications: Hierarchical taxonomies (PII, PHI, Tier) - Tags: Labels for categorization

Example: Link "Customer" glossary term to customer table, tag email column as PII.Sensitive.Email

Explore Governance Specification β†’


βœ“ Data QualityΒΆ

Testing and profiling framework

Define and track data quality: - Test Definitions: Reusable test templates - Test Cases: Applied to tables/columns - Test Suites: Organized test execution - Profiling: Statistical analysis

Example: Define uniqueness test for customer_id, run daily, track results

Explore Data Quality Specification β†’


πŸ‘₯ Teams & UsersΒΆ

Organizational structure and ownership

Model your organization: - Users: Individual people - Teams: Groups with hierarchies - Roles: Permission sets - Ownership: Asset assignments

Example: Data Engineering team owns customer_etl pipeline, Jane Doe is the owner

Explore Teams & Users Specification β†’


πŸ“œ Data ContractsΒΆ

Formal agreements across all assets

Define expectations for any data asset: - Schema requirements - Quality SLAs - Freshness guarantees - Ownership commitments

Not just tables - contracts apply to Topics, Dashboards, ML Models, APIs, and more

Explore Data Contract Specification β†’


🏒 Domains¢

Business domain organization

Organize data assets by business area or function:

  • Domain Hierarchy: Top-level and sub-domains
  • Asset Assignment: Assign tables, dashboards, pipelines to domains
  • Domain Ownership: Domain-specific owners and experts
  • Cross-Domain Dependencies: Track data flows across domains

Example: Sales domain contains customer tables, revenue dashboards, and sales pipelines

Explore Domain Specification β†’


πŸ“¦ Data ProductsΒΆ

Packaged data for consumption

Define curated data products for specific use cases:

  • Product Definition: Packaged collection of data assets
  • Assets: Tables, dashboards, ML models working together
  • SLAs: Quality, freshness, and availability guarantees
  • Consumers: Teams and applications using the product

Example: "Customer 360" data product includes customer tables, enrichment pipelines, and analytics dashboards

Explore Data Product Specification β†’


Deep Dive DocumentationΒΆ

Each metadata entity has comprehensive documentation explaining:

  • Overview: What it models and why
  • JSON Schema: Complete field reference
  • RDF Representation: Ontology classes and properties
  • JSON-LD: Semantic annotations
  • Examples: Real-world use cases
  • Relationships: How it connects to other entities

Example: Table EntityΒΆ

Table is the core entity representing database tables and views.

Key Fields:

  • name, fullyQualifiedName, description
  • columns[]: Array of column definitions with types, constraints
  • tableType: Regular, View, MaterializedView, External
  • owner, domain, tags, glossaryTerms
  • dataModel: SQL query for views
  • tableConstraints: Primary/foreign keys
  • tableProfilerConfig: Profiling settings

Relationships:

  • Belongs to databaseSchema
  • Contains columns
  • Referenced by dashboards, mlModels
  • Has testCases for quality
  • Participates in lineage

View Complete Table Specification β†’


Standards in ActionΒΆ

Use Case: Customer Data PipelineΒΆ

Assets Modeled:

PostgreSQL Database Service
  └── crm_database
        └── public schema
              └── customers table
                    β”œβ”€β”€ customer_id (PK)
                    β”œβ”€β”€ email
                    β”œβ”€β”€ name
                    └── created_date

Airflow Pipeline Service
  └── customer_etl pipeline
        β”œβ”€β”€ extract_customers task
        β”œβ”€β”€ transform_customers task
        └── load_customers task

Tableau Dashboard Service
  └── Customer Analytics dashboard
        β”œβ”€β”€ Customer Growth chart
        └── Customer Segments chart

Lineage:

customers table
  β†’ customer_etl pipeline
    β†’ warehouse.customers_dim table
      β†’ Customer Analytics dashboard

Governance:

  • customers.email tagged as PII.Sensitive.Email
  • customers table linked to "Customer" glossary term
  • GDPR compliance tag applied

Data Quality:

  • Test: customer_id is unique
  • Test: email matches regex pattern
  • Test: created_date <= today
  • Profile: Track row count daily

Ownership:

  • Data Engineering team owns customer_etl
  • Analytics team owns Customer Analytics
  • Jane Doe is data steward

Data Contract:

  • customers table must update within 1 hour
  • Email completeness >= 99%
  • Row count between 10,000 - 10,000,000

All modeled in:

  • βœ… JSON Schema with full validation
  • βœ… RDF ontology for semantic queries
  • βœ… JSON-LD for linked data
  • βœ… SHACL for constraint validation

Getting StartedΒΆ

1. Understand the StandardsΒΆ

Start with the JSON Schema overview to understand the core structures.

2. Explore Data AssetsΒΆ

Browse the hierarchical data assets organized by service type.

3. Learn Cross-Cutting ConceptsΒΆ

Understand lineage, governance, and data quality.

4. Deep DiveΒΆ

Read detailed specifications for entities like Table, Pipeline, or Dashboard.

5. Use the StandardsΒΆ

Integrate OpenMetadata Standards into your tools using the API reference.


Why OpenMetadata Standards?ΒΆ

Open SourceΒΆ

Freely available, community-driven, transparent development

ComprehensiveΒΆ

Covers databases, pipelines, dashboards, ML, governance, quality, and more

SemanticΒΆ

RDF and ontologies enable reasoning and knowledge graphs

InteroperableΒΆ

JSON-LD enables integration with any semantic web tool

ExtensibleΒΆ

Custom properties and types for your specific needs

Battle-TestedΒΆ

Used in production by organizations managing petabytes of data


Community & ContributionΒΆ


Next StepsΒΆ

πŸ“‹ JSON SchemasΒΆ

Explore the complete JSON Schema reference

Go to JSON Schemas β†’

πŸ—‚οΈ Data AssetsΒΆ

Browse all data asset types by service

Go to Data Assets β†’

πŸ”— RDF OntologyΒΆ

Understand the semantic web representation

Go to RDF β†’

πŸ“– ExamplesΒΆ

See real-world use cases and examples

Go to Examples β†’