Data Products¶
Productizing data assets for consumption and sharing
Data Products in OpenMetadata provide a framework for packaging related data assets into cohesive, discoverable, and consumable products. This product-thinking approach enables organizations to treat data as a product with defined ownership, SLAs, quality metrics, and clear value propositions for consumers.
Hierarchy Overview¶
OpenMetadata's data product structure enables packaging of data assets within business domains:
graph TB
subgraph "Domain Structure"
DOM[Domain: Sales]
DP1[Data Product<br/>Customer360]
DP2[Data Product<br/>Sales Analytics]
DP3[Data Product<br/>Real-time Pricing]
DOM --> DP1
DOM --> DP2
DOM --> DP3
end
subgraph "Data Product Assets"
DP1 --> TBL1[Table: customer_unified]
DP1 --> TBL2[Table: customer_interactions]
DP1 --> DASH1[Dashboard: Customer Insights]
DP1 --> ML1[ML Model: churn_predictor]
DP1 --> API1[API: customer_api]
DP2 --> TBL3[Table: sales_metrics]
DP2 --> DASH2[Dashboard: Sales Performance]
DP2 --> PIPE1[Pipeline: sales_aggregation]
DP3 --> TBL4[Table: pricing_data]
DP3 --> TOPIC1[Topic: pricing_events]
DP3 --> API2[API: pricing_api]
end
subgraph "Ownership"
TEAM1[Team: Customer Analytics]
USR1[User: Alice Smith]
TEAM1 -.->|owns| DP1
USR1 -.->|product owner| DP1
end
subgraph "Consumers"
TEAM2[Team: Marketing]
TEAM3[Team: Sales Ops]
TEAM2 -.->|consumes| DP1
TEAM3 -.->|consumes| DP1
end
style DOM fill:#8B5CF6,color:#fff
style DP1 fill:#F59E0B,color:#000
style DP2 fill:#F59E0B,color:#000
style DP3 fill:#F59E0B,color:#000
style TBL1 fill:#2563EB,color:#fff
style TBL2 fill:#2563EB,color:#fff
style TBL3 fill:#2563EB,color:#fff
style TBL4 fill:#2563EB,color:#fff
style DASH1 fill:#6900c7,color:#fff
style DASH2 fill:#6900c7,color:#fff
style PIPE1 fill:#4facfe,color:#fff
style ML1 fill:#f5576c,color:#fff
style TOPIC1 fill:#00B4D8,color:#fff
style API1 fill:#06B6D4,color:#fff
style API2 fill:#06B6D4,color:#fff
style TEAM1 fill:#00ac69,color:#fff
style USR1 fill:#0061f2,color:#fff
style TEAM2 fill:#94A3B8,color:#fff
style TEAM3 fill:#94A3B8,color:#fff Why Use Data Products?¶
Product-Thinking for Data¶
Data products apply product management principles to data assets. Instead of fragmented tables and dashboards, users discover complete, well-documented products designed for specific use cases.
Traditional Approach:
Database: sales_db
├── customers (what is this?)
├── customer_interactions (how fresh?)
├── customer_segments (who owns this?)
└── customer_scores (can I use this?)
Data Product Approach:
Customer360 Data Product
├── Purpose: Unified customer view for analytics
├── Owner: Customer Analytics Team
├── SLA: Updated hourly, 99.9% quality
├── Assets:
│ ├── customer_unified (source of truth)
│ ├── customer_interactions (behavioral data)
│ ├── customer_segments (ML-based segments)
│ └── customer_insights dashboard
├── Access: API + Direct query
├── Documentation: Complete user guide
└── Consumers: Marketing, Sales, Support teams
Clear Value Proposition¶
Each data product has a defined purpose, target consumers, and value proposition. Users understand what the product provides and why it exists.
Ownership and Accountability¶
Data products have designated product owners who are accountable for quality, freshness, documentation, and evolution of the product.
Service Level Agreements¶
Data products include SLAs for freshness, quality, availability, and support, setting clear expectations for consumers.
Data Product Characteristics¶
Discoverable¶
Data products are easily found through search, catalog browsing, and domain navigation. Rich metadata and documentation make them understandable.
Example: Marketing team searches for "customer segmentation" and finds Customer360 product with complete documentation and usage examples.
Addressable¶
Each data product has a unique identifier and can be accessed through consistent interfaces (APIs, SQL, dashboards).
Example:
- API:
https://api.company.com/data-products/customer360 - SQL:
SELECT * FROM data_products.customer360.customer_unified - Dashboard:
https://tableau.company.com/products/customer360
Trustworthy¶
Quality metrics, test results, and SLA compliance build trust. Consumers know they can rely on the data.
Example: Customer360 shows 99.7% quality score, last updated 15 minutes ago, all tests passing.
Self-Describing¶
Complete documentation, schema information, lineage, and usage examples make data products self-service.
Example: Customer360 includes: - Business purpose and use cases - Data dictionary for all fields - Sample queries and API examples - Known limitations and caveats - Contact information for support
Secure¶
Access control policies ensure only authorized consumers can use the product. Sensitive data is properly classified and protected.
Example: Customer360 PII fields are automatically masked for most users; full access requires data privacy training and approval.
Interoperable¶
Data products integrate with existing tools and workflows. Consumers access them through their preferred interfaces.
Example: Customer360 can be accessed via: - REST API for applications - SQL interface for analysts - Python SDK for data scientists - Pre-built dashboards for executives
Real-World Examples¶
Example 1: Customer360 Data Product¶
graph TB
subgraph "Customer360 Data Product"
DP[Customer360<br/>Product]
subgraph "Input Assets"
I1[CRM Data]
I2[Web Analytics]
I3[Support Tickets]
I4[Purchase History]
end
subgraph "Core Assets"
TBL1[customer_unified<br/>Source of Truth]
TBL2[customer_interactions<br/>Event History]
TBL3[customer_segments<br/>ML Segments]
ML1[churn_model<br/>Predictions]
end
subgraph "Output Interfaces"
DASH1[Customer Insights<br/>Dashboard]
API1[Customer API<br/>REST]
VIEW1[Analytics Views<br/>SQL]
end
I1 --> TBL1
I2 --> TBL1
I3 --> TBL2
I4 --> TBL2
TBL1 --> ML1
TBL2 --> ML1
ML1 --> TBL3
TBL1 --> DASH1
TBL2 --> DASH1
TBL3 --> DASH1
TBL1 --> API1
TBL3 --> API1
TBL1 --> VIEW1
TBL3 --> VIEW1
end
subgraph "Consumers"
C1[Marketing Team<br/>Segmentation]
C2[Sales Team<br/>Account Health]
C3[Support Team<br/>Customer Context]
end
DASH1 --> C1
API1 --> C2
VIEW1 --> C3
style DP fill:#F59E0B,color:#000,stroke-width:3px
style TBL1 fill:#2563EB,color:#fff
style TBL2 fill:#2563EB,color:#fff
style TBL3 fill:#2563EB,color:#fff
style ML1 fill:#f5576c,color:#fff
style DASH1 fill:#6900c7,color:#fff
style API1 fill:#06B6D4,color:#fff
style VIEW1 fill:#2563EB,color:#fff
style C1 fill:#00ac69,color:#fff
style C2 fill:#00ac69,color:#fff
style C3 fill:#00ac69,color:#fff Product Details:
- Owner: Customer Analytics Team
- Purpose: Unified customer view for marketing, sales, and support
- Assets: 3 tables, 1 ML model, 1 dashboard, 1 API
- SLA:
- Freshness: Updated hourly
- Quality: > 99.5% completeness
- Availability: 99.9% uptime
- Consumers: 150+ users across 12 teams
- Access: API, SQL, Dashboard
Example 2: Sales Analytics Data Product¶
graph TB
subgraph "Sales Analytics Data Product"
DP[Sales Analytics<br/>Product]
subgraph "Data Pipeline"
PIPE1[sales_etl<br/>Daily Batch]
PIPE2[real_time_sync<br/>Streaming]
end
subgraph "Core Tables"
TBL1[sales_metrics<br/>Aggregated]
TBL2[sales_transactions<br/>Detailed]
TBL3[sales_forecast<br/>Predictions]
end
subgraph "Analytics Layer"
DASH1[Sales Performance<br/>Executive Dashboard]
DASH2[Territory Analysis<br/>Regional View]
DASH3[Rep Scorecard<br/>Individual Metrics]
end
PIPE1 --> TBL1
PIPE2 --> TBL2
TBL1 --> TBL3
TBL2 --> TBL3
TBL1 --> DASH1
TBL2 --> DASH2
TBL3 --> DASH3
end
style DP fill:#F59E0B,color:#000,stroke-width:3px
style PIPE1 fill:#4facfe,color:#fff
style PIPE2 fill:#4facfe,color:#fff
style TBL1 fill:#2563EB,color:#fff
style TBL2 fill:#2563EB,color:#fff
style TBL3 fill:#2563EB,color:#fff
style DASH1 fill:#6900c7,color:#fff
style DASH2 fill:#6900c7,color:#fff
style DASH3 fill:#6900c7,color:#fff Product Details:
- Owner: Sales Operations Team
- Purpose: Comprehensive sales performance analytics
- Assets: 2 pipelines, 3 tables, 3 dashboards
- SLA:
- Freshness: Real-time for transactions, daily for aggregations
- Quality: > 99.9% accuracy
- Support: 24/7 Slack channel
- Consumers: Sales leadership, operations, individual reps
- Access: Dashboards (primary), SQL (advanced users)
Example 3: Real-time Pricing Data Product¶
graph TB
subgraph "Real-time Pricing Data Product"
DP[Real-time Pricing<br/>Product]
subgraph "Streaming Layer"
TOPIC1[pricing_events<br/>Kafka Topic]
TOPIC2[competitor_prices<br/>Kafka Topic]
end
subgraph "Processing"
STREAM1[price_aggregator<br/>Flink Job]
ML1[dynamic_pricing<br/>ML Model]
end
subgraph "Storage & Access"
TBL1[current_prices<br/>Real-time Table]
CACHE1[pricing_cache<br/>Redis]
API1[pricing_api<br/>REST/GraphQL]
end
TOPIC1 --> STREAM1
TOPIC2 --> STREAM1
STREAM1 --> ML1
ML1 --> TBL1
TBL1 --> CACHE1
CACHE1 --> API1
end
subgraph "Consumers"
APP1[E-commerce Site<br/>Live Prices]
APP2[Pricing Dashboard<br/>Monitoring]
APP3[Mobile App<br/>Product Prices]
end
API1 --> APP1
API1 --> APP2
API1 --> APP3
style DP fill:#F59E0B,color:#000,stroke-width:3px
style TOPIC1 fill:#00B4D8,color:#fff
style TOPIC2 fill:#00B4D8,color:#fff
style STREAM1 fill:#4facfe,color:#fff
style ML1 fill:#f5576c,color:#fff
style TBL1 fill:#2563EB,color:#fff
style CACHE1 fill:#8B5CF6,color:#fff
style API1 fill:#06B6D4,color:#fff
style APP1 fill:#00ac69,color:#fff
style APP2 fill:#00ac69,color:#fff
style APP3 fill:#00ac69,color:#fff Product Details:
- Owner: Pricing Team
- Purpose: Real-time product pricing for all channels
- Assets: 2 Kafka topics, 1 streaming job, 1 ML model, 1 table, 1 API
- SLA:
- Latency: < 100ms API response
- Freshness: Real-time (< 1 second)
- Availability: 99.99% uptime
- Consumers: E-commerce platform, mobile apps, pricing analysts
- Access: REST API (primary), GraphQL (advanced)
Benefits¶
1. Simplified Discovery¶
Users find complete data products instead of individual tables. "Customer360" is easier to discover than "dim_customer_v3_final".
2. Clear Ownership¶
Product owners are accountable for quality, documentation, and evolution. Consumers know who to contact for support.
3. Quality Assurance¶
Built-in quality metrics, automated tests, and SLA monitoring ensure trustworthy data.
4. Self-Service¶
Complete documentation and multiple access methods enable self-service consumption without constant support requests.
5. Reusability¶
Well-packaged products are reused across teams, reducing duplicate data pipelines and inconsistent metrics.
6. Governance¶
Domain-scoped products inherit governance policies. Access control, classification, and compliance are centrally managed.
7. Lifecycle Management¶
Products have clear versioning, deprecation policies, and evolution paths. Consumers understand when changes will occur.
8. Value Tracking¶
Track product adoption, usage patterns, and consumer satisfaction. Measure ROI of data investments.
Data Product Lifecycle¶
1. Discovery¶
Identify Opportunity: Recognize repeated data needs across teams
Example: Multiple teams building their own customer segmentation models
Activities:
- Stakeholder interviews
- Use case analysis
- Value assessment
- Feasibility study
2. Development¶
Build the Product: Create assets, pipelines, and interfaces
Example: Build Customer360 with unified customer table, ML models, and API
Activities:
- Data pipeline development
- Quality testing
- Documentation
- Access interface creation
- SLA definition
3. Publishing¶
Make Available: Release product to consumers
Example: Publish Customer360 to catalog with complete documentation
Activities:
- Catalog registration
- Access provisioning
- Consumer onboarding
- Training materials
- Launch announcement
4. Consumption¶
Active Use: Consumers use the product
Example: Marketing team uses Customer360 API for campaign targeting
Activities:
- Monitoring usage
- Collecting feedback
- Providing support
- SLA monitoring
- Quality reporting
5. Evolution¶
Continuous Improvement: Enhance based on feedback and new requirements
Example: Add social media data to Customer360 based on user requests
Activities:
- Feature requests
- Performance optimization
- Schema evolution
- New interfaces
- Deprecation of old versions
Entity Specifications¶
Explore the complete data product entity specification:
| Entity | Description | Specification |
|---|---|---|
| Data Product | Packaged data assets ready for consumption | View Spec |
The data product specification includes: - Complete field reference - JSON Schema definition - RDF/OWL ontology representation - JSON-LD context and examples - Relationship mappings - API operations
View Data Product Entity Specification →
Best Practices¶
1. Start with Consumer Needs¶
Design data products based on actual consumer use cases, not just available data.
2. Clear Product Owner¶
Assign a dedicated product owner who is accountable for the product's success.
3. Define SLAs¶
Set explicit expectations for freshness, quality, availability, and support response times.
4. Comprehensive Documentation¶
Include business context, technical details, usage examples, and limitations.
5. Multiple Access Methods¶
Support different consumption patterns (API, SQL, dashboards) for different user types.
6. Quality First¶
Implement automated quality checks and publish quality metrics transparently.
7. Version Management¶
Use semantic versioning and communicate breaking changes well in advance.
8. Monitor Usage¶
Track who's using the product and how to inform prioritization and investment decisions.
Next Steps¶
- Explore Data Product Entity - See complete data product specification
- Identify Opportunities - Find repeated data needs that could become products
- Define Product Vision - Articulate purpose, consumers, and value proposition
- Assign Ownership - Designate product owner and team
- Build Assets - Create tables, pipelines, models, and interfaces
- Publish to Catalog - Register product with complete metadata
- Onboard Consumers - Train users and enable self-service
- Iterate and Improve - Collect feedback and continuously enhance