Use Cases¶

Real-world applications of OpenMetadata Standards.

Data Catalog¶

Build a comprehensive, searchable data catalog.

Challenge¶

Organizations struggle with: - Fragmented metadata across multiple tools - No centralized search for data assets - Difficulty discovering relevant datasets - Lack of context about data quality and ownership

Solution with OpenMetadata Standards¶

Use entity schemas to catalog all data assets:

{
  "@context": "https://open-metadata.org/contexts/dataAsset.jsonld",
  "@type": "Table",
  "name": "customer_transactions",
  "fullyQualifiedName": "analytics.public.customer_transactions",
  "description": "Daily customer transaction records",
  "owner": {
    "type": "team",
    "name": "data-engineering"
  },
  "tags": [
    {"tagFQN": "Finance.Revenue"},
    {"tagFQN": "PII.Transactional"}
  ],
  "columns": [...],
  "tableProfile": {
    "rowCount": 1500000,
    "sizeInBytes": 850000000
  },
  "dataQualityTests": [
    {
      "name": "no_null_transaction_ids",
      "status": "Success"
    }
  ]
}

Benefits¶

Single source of truth for all data assets
Rich search with faceted filtering
Clear ownership and lineage
Data quality visibility

Data Governance¶

Implement enterprise-wide data governance.

Challenge¶

Ensuring compliance with regulations (GDPR, CCPA, HIPAA)
Tracking sensitive data (PII, PHI)
Enforcing access controls
Maintaining audit trails

Solution with OpenMetadata Standards¶

1. Define Classifications and Tags

{
  "name": "PII",
  "displayName": "Personally Identifiable Information",
  "description": "Data that can identify individuals",
  "children": [
    {
      "name": "PII.Sensitive",
      "tags": [
        "PII.Sensitive.SSN",
        "PII.Sensitive.CreditCard",
        "PII.Sensitive.HealthRecord"
      ]
    },
    {
      "name": "PII.NonSensitive",
      "tags": [
        "PII.NonSensitive.Email",
        "PII.NonSensitive.Phone"
      ]
    }
  ]
}

2. Apply Tags to Assets

{
  "name": "customers",
  "columns": [
    {
      "name": "email",
      "dataType": "VARCHAR",
      "tags": [{"tagFQN": "PII.NonSensitive.Email"}]
    },
    {
      "name": "ssn",
      "dataType": "VARCHAR",
      "tags": [{"tagFQN": "PII.Sensitive.SSN"}]
    }
  ]
}

3. Define Policies

{
  "name": "PII_Access_Policy",
  "policyType": "AccessControl",
  "rules": [
    {
      "name": "restrict_sensitive_pii",
      "effect": "deny",
      "resources": ["tag:PII.Sensitive.*"],
      "operations": ["ViewAll"],
      "condition": "!hasRole('DataSteward')"
    }
  ]
}

Benefits¶

Automated compliance tracking
Clear visibility into sensitive data
Enforced access controls
Complete audit trail

Data Lineage¶

Track end-to-end data flow.

Challenge¶

Understanding data origins
Tracking transformations
Impact analysis for changes
Debugging data quality issues

Solution with OpenMetadata Standards¶

Use PROV-O for lineage tracking:

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix om: <http://open-metadata.org/ontology#> .

# Source data
:raw_customers a prov:Entity, om:Table ;
    om:name "raw_customers" ;
    om:database :staging_db .

# ETL Pipeline
:customer_etl a prov:Activity, om:Pipeline ;
    prov:startedAtTime "2024-01-15T10:00:00Z"^^xsd:dateTime ;
    prov:endedAtTime "2024-01-15T10:30:00Z"^^xsd:dateTime ;
    prov:wasAssociatedWith :data_engineering_team ;
    prov:used :raw_customers .

# Derived table
:dim_customers a prov:Entity, om:Table ;
    om:name "dim_customers" ;
    om:database :analytics_db ;
    prov:wasDerivedFrom :raw_customers ;
    prov:wasGeneratedBy :customer_etl .

# Dashboard usage
:sales_dashboard a om:Dashboard ;
    prov:used :dim_customers .

Column-level lineage:

{
  "edge": {
    "fromEntity": {
      "fqn": "staging.raw_customers",
      "type": "table"
    },
    "toEntity": {
      "fqn": "analytics.dim_customers",
      "type": "table"
    },
    "lineageDetails": {
      "pipeline": {"fqn": "airflow.customer_etl"},
      "columnsLineage": [
        {
          "fromColumns": ["raw_customers.email"],
          "toColumn": "dim_customers.customer_email"
        },
        {
          "fromColumns": [
            "raw_customers.first_name",
            "raw_customers.last_name"
          ],
          "toColumn": "dim_customers.full_name",
          "function": "CONCAT(first_name, ' ', last_name)"
        }
      ]
    }
  }
}

Benefits¶

Complete visibility into data flow
Impact analysis for changes
Root cause analysis for data issues
Regulatory compliance (audit trails)

Data Quality¶

Define and monitor data quality.

Challenge¶

Ensuring data accuracy and completeness
Detecting data anomalies
Monitoring SLAs
Preventing downstream failures

Solution with OpenMetadata Standards¶

Define test cases:

{
  "name": "email_not_null",
  "testDefinition": "columnValuesToBeNotNull",
  "entityLink": "<#E::table::mydb.customers::columns::email>",
  "parameterValues": [
    {"name": "columnName", "value": "email"}
  ]
}

Create test suites:

{
  "name": "customers_critical_tests",
  "executableEntityReference": {
    "type": "table",
    "fullyQualifiedName": "mydb.customers"
  },
  "tests": [
    "email_not_null",
    "email_valid_format",
    "customer_id_unique",
    "revenue_positive"
  ]
}

Track results:

{
  "testCase": "email_not_null",
  "timestamp": "2024-01-15T14:00:00Z",
  "testCaseStatus": "Failed",
  "result": "Found 42 null values",
  "sampleData": [
    "customer_id=1001",
    "customer_id=1042"
  ],
  "incidentId": "INC-2024-001"
}

Benefits¶

Proactive quality monitoring
Automated testing
Clear quality metrics
Integration with alerting

Knowledge Graph¶

Build enterprise knowledge graphs.

Challenge¶

Connecting disparate data sources
Discovering relationships
Semantic search
Knowledge discovery

Solution with OpenMetadata Standards¶

Convert metadata to RDF:

from rdflib import Graph
import json

# Load JSON-LD with context
metadata = {
    "@context": "https://open-metadata.org/contexts/dataAsset.jsonld",
    "@id": "https://myorg.com/tables/customers",
    "@type": "Table",
    "name": "customers",
    "owner": {
        "@id": "https://myorg.com/teams/data-engineering",
        "@type": "Team"
    },
    "relatedTerms": [
        {
            "@id": "https://myorg.com/glossary/Customer",
            "@type": "GlossaryTerm"
        }
    ]
}

# Parse as RDF
g = Graph()
g.parse(data=json.dumps(metadata), format='json-ld')

# Load into triple store
# ... upload to GraphDB, Stardog, etc.

Query with SPARQL:

PREFIX om: <http://open-metadata.org/ontology#>
PREFIX org: <https://myorg.com/>

SELECT ?table ?owner ?term
WHERE {
  ?table a om:Table ;
         om:owner ?owner ;
         om:relatedTerm ?term .

  ?owner a om:Team ;
         om:name "data-engineering" .

  ?term a om:GlossaryTerm .
}

Benefits¶

Semantic understanding of metadata
Relationship discovery
Enhanced search capabilities
Integration with enterprise knowledge bases

ML Model Management¶

Track ML models and features.

Challenge¶

Model versioning and lineage
Feature engineering tracking
Model performance monitoring
Reproducibility

Solution with OpenMetadata Standards¶

{
  "name": "customer_churn_predictor_v2",
  "algorithm": "XGBoost",
  "mlFeatures": [
    {
      "name": "customer_lifetime_value",
      "dataType": "numerical",
      "featureSources": [
        {
          "dataSource": "analytics.customer_metrics",
          "dataType": "table"
        }
      ]
    }
  ],
  "mlHyperParameters": [
    {"name": "max_depth", "value": "6"},
    {"name": "learning_rate", "value": "0.1"}
  ],
  "dashboard": {
    "type": "dashboard",
    "fullyQualifiedName": "mlflow.model_monitoring"
  },
  "version": "2.1.0"
}

Benefits¶

Complete model lineage
Feature provenance
Performance tracking
Experiment reproducibility

Multi-Cloud Metadata¶

Unify metadata across cloud platforms.

Challenge¶

Data distributed across AWS, Azure, GCP
Different metadata formats
No unified catalog
Difficult to track cross-cloud lineage

Solution with OpenMetadata Standards¶

Define services for each cloud:

{
  "name": "aws_s3_prod",
  "serviceType": "S3",
  "connection": {
    "awsRegion": "us-east-1"
  }
}

{
  "name": "azure_adls_prod",
  "serviceType": "ADLS",
  "connection": {
    "accountName": "myorgstorage"
  }
}

Catalog assets with unified schema:

{
  "name": "customer_data",
  "containerType": "Bucket",
  "service": {
    "type": "storageService",
    "name": "aws_s3_prod"
  },
  "dataModel": {
    "columns": [...]
  }
}

Benefits¶

Unified view across clouds
Consistent metadata format
Cross-cloud lineage tracking
Cloud-agnostic tooling

Next Steps¶

Schema Reference - Explore schemas for your use case
Examples - See detailed examples
RDF & Ontologies - Learn about semantic capabilities