OpenMetadata StandardsΒΆ
A comprehensive metadata standard for the modern data and AI ecosystem
What Are We Modeling?ΒΆ
OpenMetadata Standards provide a unified, open-source metadata model that describes every aspect of your data and AI ecosystem - from traditional data assets to modern AI systems, covering both structured and unstructured data across your entire organization.
Comprehensive CoverageΒΆ
Traditional Data Assets: - Databases, tables, schemas, and stored procedures - Data pipelines, workflows, and DAGs - Dashboards, reports, and visualizations - Message queues, topics, and event streams - APIs, endpoints, and service contracts
Unstructured Data & Documents: - Drive services (Google Drive, OneDrive, SharePoint) - Spreadsheets, worksheets, and collaborative documents - File systems, containers, and object storage - Directories, files, and document repositories
AI Governance & LLM Systems: - Large Language Models (LLMs) and foundation models - AI Agents and autonomous systems - Model Context Protocol (MCP) servers and tools - Prompts, templates, and prompt engineering - Vector databases and embeddings - AI applications and integrations
Data Governance & Quality: - Data quality tests, suites, and profiles - Classification, tags, and glossaries - Data contracts and SLAs - Lineage from source to consumption - Teams, users, roles, and ownership - Domains and data products
AI Governance Initiative
OpenMetadata is pioneering AI Governance by extending metadata standards to cover the entire AI lifecycle - from LLMs and agents to prompts and vector databases. This enables organizations to govern AI systems with the same rigor as traditional data assets.
Learn more: AI Governance Roadmap
What This EnablesΒΆ
-
Universal Interoperability
Seamlessly connect and integrate across data platforms, document systems, and AI tools using standardized metadata schemas.
-
Semantic Understanding
Enable rich semantic queries and reasoning through RDF ontologies and knowledge graphs built on W3C standards.
-
AI Governance
Govern AI systems with the same rigor as data - track LLMs, agents, prompts, and model lineage end-to-end.
-
Unified Data Governance
Apply consistent governance policies across structured databases, unstructured documents, and AI systems.
-
Data Quality
Comprehensive testing, profiling, and validation frameworks ensuring data reliability across all asset types.
-
Complete Lineage
Track data flow from raw sources through transformations, ML pipelines, to AI applications and dashboards.
-
Clear Ownership
Define organizational structure, teams, roles, and responsibilities across all data and AI assets.
-
API-First Design
RESTful APIs enable real-time metadata updates and integrations without heavyweight infrastructure.
The Metadata StackΒΆ
OpenMetadata Standards are expressed in multiple complementary formats:
π JSON SchemaΒΆ
Human-readable, machine-validatable schemas
- JSON Schema Draft-07 specification
- 700+ schemas covering all metadata entities
- Strongly typed with validation rules
- IDE autocomplete support
- Used by OpenMetadata APIs
π RDF & OWL OntologyΒΆ
Semantic web standards for knowledge graphs
- W3C OWL ontology for formal semantics
- RDFS classes and properties
- Reasoning and inference capabilities
- SPARQL queryable
- Integration with semantic web tools
π JSON-LD ContextsΒΆ
Linked data for interoperability
- JSON-LD 1.1 contexts
- Maps JSON to RDF
- Enables semantic annotations
- Web-scale data integration
- Compatible with schema.org
β SHACL ShapesΒΆ
Validation constraints for RDF graphs
- SHACL shapes for validation
- Constraint checking
- Data quality rules
- Graph validation
- Compliance verification
The Hierarchical ModelΒΆ
OpenMetadata organizes entities in hierarchical service-based structures:
Database StackΒΆ
graph TD
DS[Database Service<br/>MySQL, PostgreSQL, Snowflake] --> DB[Database]
DB --> SCHEMA[Schema]
SCHEMA --> TABLE[Table]
SCHEMA --> SP[Stored Procedure]
TABLE --> COL[Column]
style DS fill:#667eea,color:#fff
style DB fill:#4facfe,color:#fff
style SCHEMA fill:#00f2fe,color:#333
style TABLE fill:#43e97b,color:#333
style SP fill:#43e97b,color:#333
style COL fill:#e0f2fe,color:#333 Pipeline StackΒΆ
graph TD
PS[Pipeline Service<br/>Airflow, Dagster, Prefect, dbt] --> P[Pipeline]
P --> T[Task]
style PS fill:#667eea,color:#fff
style P fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style T fill:#00f2fe,color:#333 Messaging StackΒΆ
graph TD
MS[Messaging Service<br/>Kafka, Pulsar, Kinesis] --> TOP[Topic]
TOP --> SCH[Message Schema]
style MS fill:#667eea,color:#fff
style TOP fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style SCH fill:#00f2fe,color:#333 Dashboard StackΒΆ
graph TD
DBS[Dashboard Service<br/>Tableau, Looker, PowerBI] --> DM[Data Model]
DBS --> DASH[Dashboard]
DBS --> CH[Chart]
style DBS fill:#667eea,color:#fff
style DM fill:#4facfe,color:#fff
style DASH fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style CH fill:#00f2fe,color:#333 ML StackΒΆ
graph TD
MLS[ML Model Service<br/>MLflow, SageMaker] --> ML[ML Model]
ML --> F[Features]
ML --> H[Hyperparameters]
ML --> M[Metrics]
style MLS fill:#667eea,color:#fff
style ML fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style F fill:#f093fb,color:#333
style H fill:#f093fb,color:#333
style M fill:#f093fb,color:#333 Storage StackΒΆ
graph TD
SS[Storage Service<br/>S3, GCS, Azure Blob] --> C[Container]
C --> F[Files]
style SS fill:#667eea,color:#fff
style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style F fill:#00f2fe,color:#333 Cross-Cutting ConceptsΒΆ
Beyond data assets, OpenMetadata Standards model:
π LineageΒΆ
Complete data flow tracking
Track transformations from source to dashboard to ML model using: - Column-level lineage - Asset-level lineage - W3C PROV-O provenance ontology - Pipeline execution lineage
Example: API Service β ETL Pipeline β Table β Dashboard
Explore Lineage Specification β
π GovernanceΒΆ
Business context and classification
Model business knowledge and data sensitivity: - Glossaries: Business terminology - Glossary Terms: Definitions with relationships - Classifications: Hierarchical taxonomies (PII, PHI, Tier) - Tags: Labels for categorization
Example: Link "Customer" glossary term to customer table, tag email column as PII.Sensitive.Email
Explore Governance Specification β
β Data QualityΒΆ
Testing and profiling framework
Define and track data quality: - Test Definitions: Reusable test templates - Test Cases: Applied to tables/columns - Test Suites: Organized test execution - Profiling: Statistical analysis
Example: Define uniqueness test for customer_id, run daily, track results
Explore Data Quality Specification β
π₯ Teams & UsersΒΆ
Organizational structure and ownership
Model your organization: - Users: Individual people - Teams: Groups with hierarchies - Roles: Permission sets - Ownership: Asset assignments
Example: Data Engineering team owns customer_etl pipeline, Jane Doe is the owner
Explore Teams & Users Specification β
π Data ContractsΒΆ
Formal agreements across all assets
Define expectations for any data asset: - Schema requirements - Quality SLAs - Freshness guarantees - Ownership commitments
Not just tables - contracts apply to Topics, Dashboards, ML Models, APIs, and more
Explore Data Contract Specification β
π’ DomainsΒΆ
Business domain organization
Organize data assets by business area or function:
- Domain Hierarchy: Top-level and sub-domains
- Asset Assignment: Assign tables, dashboards, pipelines to domains
- Domain Ownership: Domain-specific owners and experts
- Cross-Domain Dependencies: Track data flows across domains
Example: Sales domain contains customer tables, revenue dashboards, and sales pipelines
Explore Domain Specification β
π¦ Data ProductsΒΆ
Packaged data for consumption
Define curated data products for specific use cases:
- Product Definition: Packaged collection of data assets
- Assets: Tables, dashboards, ML models working together
- SLAs: Quality, freshness, and availability guarantees
- Consumers: Teams and applications using the product
Example: "Customer 360" data product includes customer tables, enrichment pipelines, and analytics dashboards
Explore Data Product Specification β
Deep Dive DocumentationΒΆ
Each metadata entity has comprehensive documentation explaining:
- Overview: What it models and why
- JSON Schema: Complete field reference
- RDF Representation: Ontology classes and properties
- JSON-LD: Semantic annotations
- Examples: Real-world use cases
- Relationships: How it connects to other entities
Example: Table EntityΒΆ
Table is the core entity representing database tables and views.
Key Fields:
name,fullyQualifiedName,descriptioncolumns[]: Array of column definitions with types, constraintstableType: Regular, View, MaterializedView, Externalowner,domain,tags,glossaryTermsdataModel: SQL query for viewstableConstraints: Primary/foreign keystableProfilerConfig: Profiling settings
Relationships:
- Belongs to
databaseSchema - Contains
columns - Referenced by
dashboards,mlModels - Has
testCasesfor quality - Participates in
lineage
Standards in ActionΒΆ
Use Case: Customer Data PipelineΒΆ
Assets Modeled:
PostgreSQL Database Service
βββ crm_database
βββ public schema
βββ customers table
βββ customer_id (PK)
βββ email
βββ name
βββ created_date
Airflow Pipeline Service
βββ customer_etl pipeline
βββ extract_customers task
βββ transform_customers task
βββ load_customers task
Tableau Dashboard Service
βββ Customer Analytics dashboard
βββ Customer Growth chart
βββ Customer Segments chart
Lineage:
customers table
β customer_etl pipeline
β warehouse.customers_dim table
β Customer Analytics dashboard
Governance:
customers.emailtagged asPII.Sensitive.Emailcustomerstable linked to "Customer" glossary term- GDPR compliance tag applied
Data Quality:
- Test:
customer_idis unique - Test:
emailmatches regex pattern - Test:
created_date<= today - Profile: Track row count daily
Ownership:
- Data Engineering team owns
customer_etl - Analytics team owns
Customer Analytics - Jane Doe is data steward
Data Contract:
customerstable must update within 1 hour- Email completeness >= 99%
- Row count between 10,000 - 10,000,000
All modeled in:
- β JSON Schema with full validation
- β RDF ontology for semantic queries
- β JSON-LD for linked data
- β SHACL for constraint validation
Getting StartedΒΆ
1. Understand the StandardsΒΆ
Start with the JSON Schema overview to understand the core structures.
2. Explore Data AssetsΒΆ
Browse the hierarchical data assets organized by service type.
3. Learn Cross-Cutting ConceptsΒΆ
Understand lineage, governance, and data quality.
4. Deep DiveΒΆ
Read detailed specifications for entities like Table, Pipeline, or Dashboard.
5. Use the StandardsΒΆ
Integrate OpenMetadata Standards into your tools using the API reference.
Why OpenMetadata Standards?ΒΆ
Open SourceΒΆ
Freely available, community-driven, transparent development
ComprehensiveΒΆ
Covers databases, pipelines, dashboards, ML, governance, quality, and more
SemanticΒΆ
RDF and ontologies enable reasoning and knowledge graphs
InteroperableΒΆ
JSON-LD enables integration with any semantic web tool
ExtensibleΒΆ
Custom properties and types for your specific needs
Battle-TestedΒΆ
Used in production by organizations managing petabytes of data
Community & ContributionΒΆ
- GitHub: open-metadata/OpenMetadataStandards
- Slack: #openmetadata-standards
- Contribute: See Contributing Guide