Skip to content

Operations Overview

Operations in OpenMetadata encompasses the systems and processes that manage, monitor, and maintain the metadata platform itself. This includes metadata ingestion, event notifications, automation workflows, and operational monitoring.

What is Operations?

Operations entities manage the day-to-day functioning of the metadata platform:

  • Metadata Ingestion: Automated extraction of metadata from data sources
  • Event Notifications: Real-time alerts and webhooks for metadata changes
  • Workflow Automation: Scheduled jobs and automated processes
  • Monitoring & Health: Platform health checks and operational metrics
  • Integration Management: Connections to external systems

Operations Entities

graph TB
    A[Operations] --> B1[Ingestion Pipelines]
    A --> B2[Webhooks]
    A --> B3[Workflows]
    A --> B4[Monitoring]

    B1 --> C1[Metadata Ingestion]
    B1 --> C2[Usage Ingestion]
    B1 --> C3[Lineage Ingestion]
    B1 --> C4[Profiler Jobs]

    B2 --> D1[Slack Notifications]
    B2 --> D2[Email Alerts]
    B2 --> D3[Custom Webhooks]
    B2 --> D4[Integration Events]

    style A fill:#667eea,color:#fff
    style B1 fill:#4facfe,color:#fff
    style B2 fill:#4facfe,color:#fff
    style B3 fill:#4facfe,color:#fff
    style B4 fill:#4facfe,color:#fff
    style C1 fill:#00f2fe,color:#333
    style C2 fill:#00f2fe,color:#333
    style C3 fill:#00f2fe,color:#333
    style C4 fill:#00f2fe,color:#333
    style D1 fill:#00f2fe,color:#333
    style D2 fill:#00f2fe,color:#333
    style D3 fill:#00f2fe,color:#333
    style D4 fill:#00f2fe,color:#333

Core Components

1. Ingestion Pipelines

Automated workflows that extract metadata from data sources:

  • Metadata Ingestion: Schemas, tables, columns, relationships
  • Usage Ingestion: Query logs, access patterns, popular queries
  • Lineage Ingestion: Data flow and transformation lineage
  • Profiler Ingestion: Statistical profiles and data quality metrics
  • Test Suite Execution: Automated data quality testing
  • dbt Integration: dbt model and test metadata

2. Webhooks

Event-driven notifications to external systems:

  • Real-time Events: Instant notifications on metadata changes
  • Custom Integrations: Connect to third-party tools
  • Alert Routing: Send alerts to appropriate channels
  • Audit Trail: Track all outbound notifications

3. Workflows (Future)

Automated operational workflows:

  • Scheduled metadata refreshes
  • Automated tagging and classification
  • Data quality monitoring
  • Compliance checks
  • Cleanup and archival

4. Monitoring (Future)

Platform health and operations monitoring:

  • Ingestion pipeline health
  • Webhook delivery status
  • System performance metrics
  • Error tracking and alerting

Operations Architecture

graph LR
    A[Data Sources] --> B[Ingestion Pipelines]
    B --> C[OpenMetadata Platform]

    C --> D[Event Stream]
    D --> E1[Webhooks]
    D --> E2[Search Index]
    D --> E3[Audit Log]

    E1 --> F1[Slack]
    E1 --> F2[Email]
    E1 --> F3[Custom Systems]

    G[Scheduler] --> B
    G --> H[Profiler Jobs]
    G --> I[Test Suites]

    style A fill:#764ba2,color:#fff
    style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style C fill:#667eea,color:#fff
    style D fill:#f093fb,color:#333
    style E1 fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style E2 fill:#00f2fe,color:#333
    style E3 fill:#ffd700,color:#333
    style F1 fill:#00f2fe,color:#333
    style F2 fill:#00f2fe,color:#333
    style F3 fill:#00f2fe,color:#333
    style G fill:#43e97b,color:#333
    style H fill:#f5576c,color:#fff
    style I fill:#f5576c,color:#fff

Use Cases

Automated Metadata Discovery

Automatically discover and catalog data assets:

graph LR
    A[New Table Created] --> B[Ingestion Pipeline]
    B --> C[Metadata Extracted]
    C --> D[OpenMetadata Catalog]
    D --> E[Webhook Notification]
    E --> F[Team Slack Channel]

    style A fill:#764ba2,color:#fff
    style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style C fill:#00f2fe,color:#333
    style D fill:#667eea,color:#fff
    style E fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style F fill:#43e97b,color:#333

Real-time Change Notifications

Get instant alerts when metadata changes:

  • Schema changes in production tables
  • New PII data tagged
  • Data quality test failures
  • Ownership changes
  • Policy violations

Data Quality Automation

Automate data quality monitoring:

  1. Schedule profiler jobs
  2. Run data quality tests
  3. Track quality metrics over time
  4. Alert on quality degradation
  5. Trigger remediation workflows

Compliance Auditing

Maintain audit trail for compliance:

  • Track all metadata changes
  • Monitor data access patterns
  • Alert on policy violations
  • Generate compliance reports
  • Maintain change history

Ingestion Pipeline Types

Metadata Ingestion

Extract schema, structure, and relationships:

  • Tables, columns, data types
  • Primary and foreign keys
  • Relationships and constraints
  • Descriptions and tags
  • Ownership information

Usage Ingestion

Extract query logs and usage patterns:

  • Popular tables and columns
  • Query patterns and frequency
  • User access patterns
  • Join relationships
  • Query performance

Lineage Ingestion

Extract data flow and transformations:

  • Column-level lineage
  • Pipeline dependencies
  • Transformation logic
  • Data provenance
  • Impact analysis

Profiler Ingestion

Collect data quality metrics:

  • Row counts and table statistics
  • Column profiles and distributions
  • Null percentages
  • Unique value counts
  • Data type validation

Test Suite Execution

Run automated quality tests:

  • Schema validation
  • Data quality checks
  • Freshness verification
  • Completeness testing
  • Custom business rules

Webhook Event Types

Entity Events

  • entityCreated: New entity created
  • entityUpdated: Entity modified
  • entitySoftDeleted: Entity soft deleted
  • entityDeleted: Entity permanently deleted

Quality Events

  • Data quality test failures
  • Profile metric anomalies
  • SLA violations
  • Data freshness issues

Schema Events

  • Schema changes detected
  • Breaking changes identified
  • New columns added
  • Columns removed

Access Events

  • Policy violations
  • Unauthorized access attempts
  • Permission changes
  • Security alerts

Best Practices

1. Choose the Right Ingestion Pattern

Select the appropriate pattern based on your needs:

Push-based (Real-time APIs): - Real-time application metadata - Event-driven updates - Immediate lineage tracking - Live quality metrics

Pull-based (Scheduled): - Batch metadata discovery - Periodic synchronization - Historical data profiling - Low-frequency updates

Schedule Frequency (for pull-based): - Hourly: Frequently changing data - Daily: Most production tables - Weekly: Static reference data - On-demand: Ad-hoc discovery

2. Filter Wisely

Use filters to avoid ingesting unnecessary metadata: - Include/exclude patterns - Schema filtering - Table name patterns - Database filtering

3. Monitor Pipeline Health

Track ingestion pipeline execution: - Success/failure rates - Execution duration - Entities processed - Error patterns

4. Configure Retry Logic

Handle transient failures gracefully: - Exponential backoff - Maximum retry attempts - Error notification thresholds - Failure handling policies

5. Secure Credentials

Protect data source credentials: - Use secrets management - Rotate credentials regularly - Least privilege access - Encrypted storage

6. Test Before Production

Validate pipelines in non-production: - Test filters and patterns - Verify metadata quality - Check performance impact - Validate transformations

7. Document Configurations

Maintain clear documentation: - Pipeline purpose and scope - Schedule rationale - Filter explanations - Troubleshooting guides

8. Optimize Webhook Delivery

Ensure reliable notification delivery: - Implement idempotency - Handle retries - Monitor delivery rates - Filter events appropriately

Monitoring Operations

Key Metrics

Metric Description Target
Pipeline Success Rate % of successful pipeline runs > 99%
Ingestion Latency Time to complete ingestion < 1 hour
Webhook Delivery Rate % of webhooks delivered successfully > 99.9%
Event Processing Lag Delay in event processing < 1 minute
Error Rate % of failed operations < 1%

Alerts

Configure alerts for operational issues: - Pipeline failures - Webhook delivery failures - High error rates - Performance degradation - Resource exhaustion

Integration Patterns

OpenMetadata supports multiple ingestion patterns to meet different needs:

Ingestion Pattern Comparison

Feature Pull-based Ingestion Push-based Ingestion (APIs) Webhooks (Notifications)
Direction OpenMetadata pulls from source Source pushes to OpenMetadata OpenMetadata pushes to destination
Timing Scheduled (batch) Real-time (immediate) Real-time (immediate)
Latency Minutes to hours Milliseconds Milliseconds
Use Case Metadata discovery Application-driven updates External system notifications
Complexity Higher (connector needed) Lower (standard REST API) Lower (standard webhooks)
Infrastructure Requires scheduler No additional infrastructure No additional infrastructure
Examples Database schema discovery Lineage from Spark jobs Slack alerts on schema changes

Pull-based Ingestion

Scheduled extraction from source systems:

graph LR
    A[Scheduler] --> B[Ingestion Pipeline]
    B --> C[Data Source]
    C --> D[Metadata]
    D --> B
    B --> E[OpenMetadata]

    style A fill:#43e97b,color:#333
    style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style C fill:#764ba2,color:#fff
    style D fill:#00f2fe,color:#333
    style E fill:#667eea,color:#fff

Use Cases: - Batch metadata discovery - Scheduled profiling and quality checks - Historical data synchronization - Low-frequency updates

Push-based Ingestion (Real-time via APIs)

Real-time metadata updates through REST APIs:

graph LR
    A[Application/Service] --> B[OpenMetadata REST API]
    B --> C[Metadata Store]
    C --> D[Event Stream]
    D --> E[Real-time Updates]

    style A fill:#764ba2,color:#fff
    style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style C fill:#667eea,color:#fff
    style D fill:#f093fb,color:#333
    style E fill:#00f2fe,color:#333

Use Cases: - Real-time metadata updates from applications - Event-driven metadata synchronization - Immediate lineage tracking - Live data quality reporting - Dynamic schema registration

Why APIs are the Standard for Real-time:

OpenMetadata uses REST APIs for push-based, real-time ingestion - the same approach that powers the entire internet. Just as Stripe processes millions of real-time payments and Twilio handles real-time communications through APIs, OpenMetadata delivers real-time metadata updates without requiring heavyweight message queues.

Key Advantages:

  • Simplicity: Standard HTTP/REST - no additional infrastructure required
  • Universal: Works with any programming language or platform
  • Reliable: Battle-tested pattern used by Stripe, Twilio, GitHub, and thousands of other services
  • Scalable: Modern API gateways handle millions of requests per second
  • Developer-friendly: Easy to integrate, test, and debug
  • Secure: Standard authentication and encryption (OAuth, JWT, TLS)

Real-time Without Kafka

While some vendors claim you need Kafka for "real-time" capabilities, the reality is different. The world's most critical real-time systems - payment processing, communication platforms, ride-sharing, and financial services - all run on REST APIs. OpenMetadata follows this proven, simpler approach.

Learn more: Why OpenMetadata is the Right Choice for You

Push-based Notifications (Webhooks)

Real-time event delivery to external systems:

graph LR
    A[Metadata Change] --> B[Event Stream]
    B --> C[Webhook]
    C --> D[External System]

    style A fill:#00f2fe,color:#333
    style B fill:#f093fb,color:#333
    style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style D fill:#764ba2,color:#fff

Use Cases: - Notify external systems of metadata changes - Trigger downstream workflows - Send alerts to Slack, email, or other tools - Maintain synchronization with other platforms

Next Steps