Operations Overview¶
Operations in OpenMetadata encompasses the systems and processes that manage, monitor, and maintain the metadata platform itself. This includes metadata ingestion, event notifications, automation workflows, and operational monitoring.
What is Operations?¶
Operations entities manage the day-to-day functioning of the metadata platform:
- Metadata Ingestion: Automated extraction of metadata from data sources
- Event Notifications: Real-time alerts and webhooks for metadata changes
- Workflow Automation: Scheduled jobs and automated processes
- Monitoring & Health: Platform health checks and operational metrics
- Integration Management: Connections to external systems
Operations Entities¶
graph TB
A[Operations] --> B1[Ingestion Pipelines]
A --> B2[Webhooks]
A --> B3[Workflows]
A --> B4[Monitoring]
B1 --> C1[Metadata Ingestion]
B1 --> C2[Usage Ingestion]
B1 --> C3[Lineage Ingestion]
B1 --> C4[Profiler Jobs]
B2 --> D1[Slack Notifications]
B2 --> D2[Email Alerts]
B2 --> D3[Custom Webhooks]
B2 --> D4[Integration Events]
style A fill:#667eea,color:#fff
style B1 fill:#4facfe,color:#fff
style B2 fill:#4facfe,color:#fff
style B3 fill:#4facfe,color:#fff
style B4 fill:#4facfe,color:#fff
style C1 fill:#00f2fe,color:#333
style C2 fill:#00f2fe,color:#333
style C3 fill:#00f2fe,color:#333
style C4 fill:#00f2fe,color:#333
style D1 fill:#00f2fe,color:#333
style D2 fill:#00f2fe,color:#333
style D3 fill:#00f2fe,color:#333
style D4 fill:#00f2fe,color:#333 Core Components¶
1. Ingestion Pipelines¶
Automated workflows that extract metadata from data sources:
- Metadata Ingestion: Schemas, tables, columns, relationships
- Usage Ingestion: Query logs, access patterns, popular queries
- Lineage Ingestion: Data flow and transformation lineage
- Profiler Ingestion: Statistical profiles and data quality metrics
- Test Suite Execution: Automated data quality testing
- dbt Integration: dbt model and test metadata
2. Webhooks¶
Event-driven notifications to external systems:
- Real-time Events: Instant notifications on metadata changes
- Custom Integrations: Connect to third-party tools
- Alert Routing: Send alerts to appropriate channels
- Audit Trail: Track all outbound notifications
3. Workflows (Future)¶
Automated operational workflows:
- Scheduled metadata refreshes
- Automated tagging and classification
- Data quality monitoring
- Compliance checks
- Cleanup and archival
4. Monitoring (Future)¶
Platform health and operations monitoring:
- Ingestion pipeline health
- Webhook delivery status
- System performance metrics
- Error tracking and alerting
Operations Architecture¶
graph LR
A[Data Sources] --> B[Ingestion Pipelines]
B --> C[OpenMetadata Platform]
C --> D[Event Stream]
D --> E1[Webhooks]
D --> E2[Search Index]
D --> E3[Audit Log]
E1 --> F1[Slack]
E1 --> F2[Email]
E1 --> F3[Custom Systems]
G[Scheduler] --> B
G --> H[Profiler Jobs]
G --> I[Test Suites]
style A fill:#764ba2,color:#fff
style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style C fill:#667eea,color:#fff
style D fill:#f093fb,color:#333
style E1 fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style E2 fill:#00f2fe,color:#333
style E3 fill:#ffd700,color:#333
style F1 fill:#00f2fe,color:#333
style F2 fill:#00f2fe,color:#333
style F3 fill:#00f2fe,color:#333
style G fill:#43e97b,color:#333
style H fill:#f5576c,color:#fff
style I fill:#f5576c,color:#fff Use Cases¶
Automated Metadata Discovery¶
Automatically discover and catalog data assets:
graph LR
A[New Table Created] --> B[Ingestion Pipeline]
B --> C[Metadata Extracted]
C --> D[OpenMetadata Catalog]
D --> E[Webhook Notification]
E --> F[Team Slack Channel]
style A fill:#764ba2,color:#fff
style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style C fill:#00f2fe,color:#333
style D fill:#667eea,color:#fff
style E fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style F fill:#43e97b,color:#333 Real-time Change Notifications¶
Get instant alerts when metadata changes:
- Schema changes in production tables
- New PII data tagged
- Data quality test failures
- Ownership changes
- Policy violations
Data Quality Automation¶
Automate data quality monitoring:
- Schedule profiler jobs
- Run data quality tests
- Track quality metrics over time
- Alert on quality degradation
- Trigger remediation workflows
Compliance Auditing¶
Maintain audit trail for compliance:
- Track all metadata changes
- Monitor data access patterns
- Alert on policy violations
- Generate compliance reports
- Maintain change history
Ingestion Pipeline Types¶
Metadata Ingestion¶
Extract schema, structure, and relationships:
- Tables, columns, data types
- Primary and foreign keys
- Relationships and constraints
- Descriptions and tags
- Ownership information
Usage Ingestion¶
Extract query logs and usage patterns:
- Popular tables and columns
- Query patterns and frequency
- User access patterns
- Join relationships
- Query performance
Lineage Ingestion¶
Extract data flow and transformations:
- Column-level lineage
- Pipeline dependencies
- Transformation logic
- Data provenance
- Impact analysis
Profiler Ingestion¶
Collect data quality metrics:
- Row counts and table statistics
- Column profiles and distributions
- Null percentages
- Unique value counts
- Data type validation
Test Suite Execution¶
Run automated quality tests:
- Schema validation
- Data quality checks
- Freshness verification
- Completeness testing
- Custom business rules
Webhook Event Types¶
Entity Events¶
entityCreated: New entity createdentityUpdated: Entity modifiedentitySoftDeleted: Entity soft deletedentityDeleted: Entity permanently deleted
Quality Events¶
- Data quality test failures
- Profile metric anomalies
- SLA violations
- Data freshness issues
Schema Events¶
- Schema changes detected
- Breaking changes identified
- New columns added
- Columns removed
Access Events¶
- Policy violations
- Unauthorized access attempts
- Permission changes
- Security alerts
Best Practices¶
1. Choose the Right Ingestion Pattern¶
Select the appropriate pattern based on your needs:
Push-based (Real-time APIs): - Real-time application metadata - Event-driven updates - Immediate lineage tracking - Live quality metrics
Pull-based (Scheduled): - Batch metadata discovery - Periodic synchronization - Historical data profiling - Low-frequency updates
Schedule Frequency (for pull-based): - Hourly: Frequently changing data - Daily: Most production tables - Weekly: Static reference data - On-demand: Ad-hoc discovery
2. Filter Wisely¶
Use filters to avoid ingesting unnecessary metadata: - Include/exclude patterns - Schema filtering - Table name patterns - Database filtering
3. Monitor Pipeline Health¶
Track ingestion pipeline execution: - Success/failure rates - Execution duration - Entities processed - Error patterns
4. Configure Retry Logic¶
Handle transient failures gracefully: - Exponential backoff - Maximum retry attempts - Error notification thresholds - Failure handling policies
5. Secure Credentials¶
Protect data source credentials: - Use secrets management - Rotate credentials regularly - Least privilege access - Encrypted storage
6. Test Before Production¶
Validate pipelines in non-production: - Test filters and patterns - Verify metadata quality - Check performance impact - Validate transformations
7. Document Configurations¶
Maintain clear documentation: - Pipeline purpose and scope - Schedule rationale - Filter explanations - Troubleshooting guides
8. Optimize Webhook Delivery¶
Ensure reliable notification delivery: - Implement idempotency - Handle retries - Monitor delivery rates - Filter events appropriately
Monitoring Operations¶
Key Metrics¶
| Metric | Description | Target |
|---|---|---|
| Pipeline Success Rate | % of successful pipeline runs | > 99% |
| Ingestion Latency | Time to complete ingestion | < 1 hour |
| Webhook Delivery Rate | % of webhooks delivered successfully | > 99.9% |
| Event Processing Lag | Delay in event processing | < 1 minute |
| Error Rate | % of failed operations | < 1% |
Alerts¶
Configure alerts for operational issues: - Pipeline failures - Webhook delivery failures - High error rates - Performance degradation - Resource exhaustion
Integration Patterns¶
OpenMetadata supports multiple ingestion patterns to meet different needs:
Ingestion Pattern Comparison¶
| Feature | Pull-based Ingestion | Push-based Ingestion (APIs) | Webhooks (Notifications) |
|---|---|---|---|
| Direction | OpenMetadata pulls from source | Source pushes to OpenMetadata | OpenMetadata pushes to destination |
| Timing | Scheduled (batch) | Real-time (immediate) | Real-time (immediate) |
| Latency | Minutes to hours | Milliseconds | Milliseconds |
| Use Case | Metadata discovery | Application-driven updates | External system notifications |
| Complexity | Higher (connector needed) | Lower (standard REST API) | Lower (standard webhooks) |
| Infrastructure | Requires scheduler | No additional infrastructure | No additional infrastructure |
| Examples | Database schema discovery | Lineage from Spark jobs | Slack alerts on schema changes |
Pull-based Ingestion¶
Scheduled extraction from source systems:
graph LR
A[Scheduler] --> B[Ingestion Pipeline]
B --> C[Data Source]
C --> D[Metadata]
D --> B
B --> E[OpenMetadata]
style A fill:#43e97b,color:#333
style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style C fill:#764ba2,color:#fff
style D fill:#00f2fe,color:#333
style E fill:#667eea,color:#fff Use Cases: - Batch metadata discovery - Scheduled profiling and quality checks - Historical data synchronization - Low-frequency updates
Push-based Ingestion (Real-time via APIs)¶
Real-time metadata updates through REST APIs:
graph LR
A[Application/Service] --> B[OpenMetadata REST API]
B --> C[Metadata Store]
C --> D[Event Stream]
D --> E[Real-time Updates]
style A fill:#764ba2,color:#fff
style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style C fill:#667eea,color:#fff
style D fill:#f093fb,color:#333
style E fill:#00f2fe,color:#333 Use Cases: - Real-time metadata updates from applications - Event-driven metadata synchronization - Immediate lineage tracking - Live data quality reporting - Dynamic schema registration
Why APIs are the Standard for Real-time:
OpenMetadata uses REST APIs for push-based, real-time ingestion - the same approach that powers the entire internet. Just as Stripe processes millions of real-time payments and Twilio handles real-time communications through APIs, OpenMetadata delivers real-time metadata updates without requiring heavyweight message queues.
Key Advantages:
- Simplicity: Standard HTTP/REST - no additional infrastructure required
- Universal: Works with any programming language or platform
- Reliable: Battle-tested pattern used by Stripe, Twilio, GitHub, and thousands of other services
- Scalable: Modern API gateways handle millions of requests per second
- Developer-friendly: Easy to integrate, test, and debug
- Secure: Standard authentication and encryption (OAuth, JWT, TLS)
Real-time Without Kafka
While some vendors claim you need Kafka for "real-time" capabilities, the reality is different. The world's most critical real-time systems - payment processing, communication platforms, ride-sharing, and financial services - all run on REST APIs. OpenMetadata follows this proven, simpler approach.
Learn more: Why OpenMetadata is the Right Choice for You
Push-based Notifications (Webhooks)¶
Real-time event delivery to external systems:
graph LR
A[Metadata Change] --> B[Event Stream]
B --> C[Webhook]
C --> D[External System]
style A fill:#00f2fe,color:#333
style B fill:#f093fb,color:#333
style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style D fill:#764ba2,color:#fff Use Cases: - Notify external systems of metadata changes - Trigger downstream workflows - Send alerts to Slack, email, or other tools - Maintain synchronization with other platforms
Related Entities¶
- Ingestion Pipeline: Automated metadata ingestion workflows
- Webhook: Event notification system
- Alert: Quality and operational alerts
- Data Profile: Results from profiler jobs
- Test Case: Automated quality tests
- Change Event: Metadata change events
- Database Service: Data sources for ingestion
Next Steps¶
- Ingestion Pipeline Entity: Detailed specification for ingestion pipelines
- Webhook Entity: Detailed specification for webhooks
- Change Events: Understanding metadata change events