Data Assets¶
Comprehensive metadata models for all data resources in your organization
Overview¶
OpenMetadata Standards model 10+ data asset types organized hierarchically by service type. Each service type contains specific asset entities that represent actual data resources.
Service Hierarchy¶
All data assets follow a consistent pattern:
graph TD
A[Service] --> B1[Container 1]
A[Service] --> B2[Container 2]
B1 --> C1[Asset 1]
B1 --> C2[Asset 2]
B2 --> C3[Asset 3]
C1 --> D1[Sub-Asset 1]
C1 --> D2[Sub-Asset 2]
C3 --> D3[Sub-Asset 3]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style D1 fill:#4facfe,color:#fff
style D2 fill:#4facfe,color:#fff
style D3 fill:#4facfe,color:#fff Examples:
DatabaseService → Database → Schema → Table → Column
PipelineService → Pipeline → Task
MessagingService → Topic → Schema Fields
StorageService → Container → Files
This hierarchical organization provides: - Consistent structure across all asset types - Clear ownership at the service level - Logical grouping of related assets - Simplified navigation and discovery
Database Assets¶
Service Type: DatabaseService
Relational and analytical databases across all platforms.
Hierarchy¶
graph TD
A[DatabaseService<br/>PostgreSQL, MySQL, Snowflake] --> B1[Database:<br/>ecommerce]
A --> B2[Database:<br/>analytics]
B1 --> C1[Schema:<br/>public]
B1 --> C2[Schema:<br/>sales]
C1 --> D1[Table:<br/>customers]
C1 --> D2[Table:<br/>orders]
C2 --> D3[StoredProcedure:<br/>calculate_revenue]
D1 --> E1[Column: id]
D1 --> E2[Column: email]
D2 --> E3[Column: order_id]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style D1 fill:#4facfe,color:#fff
style D2 fill:#4facfe,color:#fff
style D3 fill:#00f2fe,color:#fff
style E1 fill:#43e97b,color:#fff
style E2 fill:#43e97b,color:#fff
style E3 fill:#43e97b,color:#fff Entities¶
Database Service¶
Connection to database systems - credentials, configuration, connection strings
Supported Platforms: PostgreSQL, MySQL, Oracle, SQL Server, Snowflake, BigQuery, Redshift, Databricks, Hive, Presto, Trino, ClickHouse, DynamoDB, MongoDB, Cassandra, and 50+ more
Database¶
Database container grouping schemas
Properties: Name, description, owner, tags, retention policy
Schema¶
Namespace within a database containing tables and procedures
Properties: Name, database reference, contained tables, retention settings
Table 📘 Deep Dive¶
Database tables, views, materialized views - the core data structure
Properties:
- Columns with types, constraints, descriptions
- Table type (Regular, View, MaterializedView, External, Temporary, SecureView, Transient)
- Primary/foreign keys
- Partitioning configuration
- Owner, domain, tags, glossary terms
- Data quality tests
- Lineage relationships
- Profiling results
View Complete Table Specification →
Stored Procedure¶
Database procedures and functions
Properties: Procedure code, parameters, return type, language (SQL, PL/SQL, T-SQL), dependencies
Pipeline Assets¶
Service Type: PipelineService
Data orchestration and transformation workflows.
Hierarchy¶
graph TD
A[PipelineService<br/>Airflow, Dagster, Prefect] --> B1[Pipeline:<br/>daily_etl_pipeline]
A --> B2[Pipeline:<br/>ml_training_workflow]
B1 --> C1[Task:<br/>extract_data]
B1 --> C2[Task:<br/>transform_data]
B1 --> C3[Task:<br/>load_to_warehouse]
B2 --> C4[Task:<br/>prepare_features]
B2 --> C5[Task:<br/>train_model]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff
style C5 fill:#f093fb,color:#fff Entities¶
Pipeline Service¶
Connection to orchestration platforms
Supported Platforms: Airflow, Dagster, Prefect, Fivetran, dbt, Glue, Data Factory, NiFi, Airbyte
Pipeline 📘 Deep Dive¶
Complete data pipeline with tasks and dependencies
Properties:
- Pipeline schedule/trigger
- Tasks with DAG structure
- Upstream/downstream task dependencies
- Execution history and status
- Owner and tags
- Lineage to source and target tables
View Complete Pipeline Specification →
Task¶
Individual task within a pipeline
Properties: Task type (SQL, Python, Spark, dbt, Shell, Container), dependencies, configuration
Messaging Assets¶
Service Type: MessagingService
Event streaming and message queue platforms.
Hierarchy¶
graph TD
A[MessagingService<br/>Kafka, Pulsar, Kinesis] --> B1[Topic:<br/>user_events]
A --> B2[Topic:<br/>order_notifications]
B1 --> C1[Schema Field:<br/>user_id]
B1 --> C2[Schema Field:<br/>event_type]
B1 --> C3[Schema Field:<br/>timestamp]
B2 --> C4[Schema Field:<br/>order_id]
B2 --> C5[Schema Field:<br/>status]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff
style C5 fill:#f093fb,color:#fff Entities¶
Messaging Service¶
Connection to message brokers
Supported Platforms: Kafka, Pulsar, Kinesis, RabbitMQ, SQS, Azure Event Hub
Topic 📘 Deep Dive¶
Message queue topic for event streaming
Properties:
- Partition count and replication factor
- Retention policy (time and size)
- Cleanup policy (delete, compact)
- Message schema (Avro, Protobuf, JSON Schema)
- Schema evolution and versions
- Consumer groups
- Owner and tags
View Complete Topic Specification →
Dashboard Assets¶
Service Type: DashboardService
Business intelligence and analytics platforms.
Hierarchy¶
graph TD
A[DashboardService<br/>Tableau, Looker, PowerBI] --> B1[Dashboard:<br/>Sales Performance]
A --> B2[Dashboard:<br/>Customer Analytics]
A --> B3[Data Model:<br/>Sales Metrics]
B1 --> C1[Chart:<br/>Monthly Revenue]
B1 --> C2[Chart:<br/>Top Products]
B2 --> C3[Chart:<br/>Customer Segments]
B2 --> C4[Chart:<br/>Retention Rate]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style B3 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff Entities¶
Dashboard Service¶
Connection to BI platforms
Supported Platforms: Tableau, Looker, PowerBI, Superset, Metabase, Mode, QuickSight, Redash, Sigma
Dashboard Data Model¶
Semantic layer and data model (LookML, Tableau Data Source)
Properties: Model definition, dimensions, measures, relationships, SQL generation logic
Dashboard 📘 Deep Dive¶
Complete dashboard with visualizations
Properties:
- Dashboard URL and project
- Contained charts
- Data sources (lineage to tables)
- Dashboard-level filters
- View count and usage statistics
- Owner and tags
View Complete Dashboard Specification →
Chart¶
Individual visualization within dashboards
Properties: Chart type (Bar, Line, Pie, Table, Scatter), query, filters, configuration
ML Assets¶
Service Type: MLModelService
Machine learning models and serving endpoints.
Hierarchy¶
graph TD
A[MLModelService<br/>MLflow, SageMaker, Vertex AI] --> B1[MLModel:<br/>churn_predictor]
A --> B2[MLModel:<br/>recommendation_engine]
B1 --> C1[Feature:<br/>user_tenure]
B1 --> C2[Feature:<br/>activity_score]
B1 --> C3[Feature:<br/>support_tickets]
B2 --> C4[Feature:<br/>user_preferences]
B2 --> C5[Feature:<br/>purchase_history]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff
style C5 fill:#f093fb,color:#fff Entities¶
ML Model Service¶
Connection to ML platforms
Supported Platforms: MLflow, SageMaker, Vertex AI, Azure ML, Databricks ML
ML Model 📘 Deep Dive¶
Trained machine learning model
Properties:
- Algorithm (XGBoost, Random Forest, Neural Network, etc.)
- Model type (Classification, Regression, Clustering, etc.)
- Features with sources (lineage to tables)
- Hyperparameters (learning rate, max depth, etc.)
- Performance metrics (accuracy, precision, recall, AUC, F1)
- Training data references
- Model version
- Dashboard for monitoring
- Owner and tags
View Complete ML Model Specification →
Storage Assets¶
Service Type: StorageService
Object storage and data lakes.
Hierarchy¶
graph TD
A[StorageService<br/>S3, GCS, Azure Blob] --> B1[Container:<br/>data-lake-raw]
A --> B2[Container:<br/>data-lake-processed]
B1 --> C1[File:<br/>events/2024/01/data.parquet]
B1 --> C2[File:<br/>events/2024/02/data.parquet]
B2 --> C3[File:<br/>analytics/users.parquet]
B2 --> C4[File:<br/>analytics/orders.parquet]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff Entities¶
Storage Service¶
Connection to object storage
Supported Platforms: S3, GCS, Azure Blob Storage, ADLS, MinIO
Container 📘 Deep Dive¶
Storage bucket or container
Properties:
- Full path and naming
- File formats (Parquet, CSV, JSON, Avro, ORC)
- Schema information
- Partitioning scheme
- Object count and total size
- Access patterns
- Owner and tags
View Complete Container Specification →
API Assets¶
Service Type: APIService
REST APIs and endpoints.
Hierarchy¶
graph TD
A[APIService<br/>REST API Platform] --> B1[APICollection:<br/>User Management]
A --> B2[APICollection:<br/>Payment Processing]
B1 --> C1[APIEndpoint:<br/>GET /users]
B1 --> C2[APIEndpoint:<br/>POST /users]
B1 --> C3[APIEndpoint:<br/>PUT /users/:id]
B2 --> C4[APIEndpoint:<br/>POST /payments]
B2 --> C5[APIEndpoint:<br/>GET /payments/:id]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff
style C5 fill:#f093fb,color:#fff Entities¶
API Service¶
Connection to API platforms
Properties: Base URL, authentication, version
API Collection¶
Group of related API endpoints
Properties: Collection name, description, endpoints
API Endpoint 📘 Deep Dive¶
Individual REST API endpoint
Properties:
- HTTP method (GET, POST, PUT, DELETE)
- Endpoint URL and path parameters
- Request schema (headers, body)
- Response schema (status codes, body)
- Authentication requirements
- Rate limits
- Owner and tags
View Complete API Endpoint Specification →
Search Assets¶
Service Type: SearchService
Search indexes from Elasticsearch, OpenSearch.
Hierarchy¶
graph TD
A[SearchService<br/>Elasticsearch, OpenSearch] --> B1[SearchIndex:<br/>products]
A --> B2[SearchIndex:<br/>customers]
B1 --> C1[Field:<br/>product_name]
B1 --> C2[Field:<br/>description]
B1 --> C3[Field:<br/>price]
B2 --> C4[Field:<br/>customer_id]
B2 --> C5[Field:<br/>email]
style A fill:#667eea,color:#fff
style B1 fill:#764ba2,color:#fff
style B2 fill:#764ba2,color:#fff
style C1 fill:#f093fb,color:#fff
style C2 fill:#f093fb,color:#fff
style C3 fill:#f093fb,color:#fff
style C4 fill:#f093fb,color:#fff
style C5 fill:#f093fb,color:#fff Entities¶
Search Service¶
Connection to search platforms
Supported Platforms: Elasticsearch, OpenSearch
Search Index¶
Search index with mappings and settings
Properties: Index name, field mappings, analyzers, document count, index settings, owner
Cross-Asset Concepts¶
These entities apply across all asset types:
Data Contracts¶
Formal SLA agreements for any data asset - tables, topics, dashboards, ML models, APIs
Lineage¶
Relationships showing data flow between any assets
Data Quality¶
Tests and profiling applicable to tables, topics, containers
Governance¶
Tags, glossary terms, and classifications on all assets
Ownership¶
Users and teams owning any asset
Common Properties¶
All data assets share these properties:
Identity¶
id: UUIDname: Entity namefullyQualifiedName: Complete hierarchical namedisplayName: Human-readable name
Documentation¶
description: Markdown descriptiontags[]: Classification tagsglossaryTerms[]: Business definitions
Ownership¶
owner: User or teamdomain: Business domainexperts[]: Subject matter experts
Versioning¶
version: Metadata versionupdatedAt: Last update timestampupdatedBy: User who updatedchangeDescription: Change details
Lifecycle¶
deleted: Soft delete flagextension: Custom propertieshref: API resource link
Next Steps¶
Explore by Service Type¶
Choose a service type to see detailed entity specifications:
- Databases - Tables, schemas, stored procedures
- Pipelines - Workflows and tasks
- Messaging - Topics and schemas
- Dashboards - Dashboards and charts
- ML Models - Models and features
- Storage - Buckets and containers
- APIs - REST endpoints
Understand Cross-Cutting Concepts¶
Learn how these concepts apply to all assets:
- Lineage - Data flow tracking
- Governance - Tags and glossaries
- Data Quality - Testing and profiling
- Data Contracts - Formal agreements
See Examples¶
View real-world examples:
- Examples - Complete use cases