Column¶
A Column represents an individual column or field within a database table, view, or stored procedure. While columns are typically modeled as properties of tables in OpenMetadata, this entity specification defines the comprehensive metadata model for column-level information.
Overview¶
Columns in OpenMetadata capture detailed metadata about:
- Schema Information: Data type, length, precision, scale, nullability
- Constraints: Primary key, foreign key, unique, not null constraints
- Semantics: Business descriptions, glossary term mappings
- Classification: PII tags, sensitivity levels, data classifications
- Lineage: Column-level lineage showing data flow transformations
- Quality: Column-specific quality metrics and test results
- Profiling: Statistical profiles including distributions and patterns
- Usage: Query patterns, access frequency, popular joins
Columns are essential for: - Data discovery and understanding - Privacy and compliance management - Data lineage and impact analysis - Query optimization - Data quality monitoring - Semantic understanding
Hierarchy¶
graph LR
A[Table/View] --> B[Column]
B --> C1[Data Type]
B --> C2[Constraints]
B --> C3[Tags]
B --> C4[Profile]
style A fill:#667eea,color:#fff
style B fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style C1 fill:#00f2fe,color:#333
style C2 fill:#00f2fe,color:#333
style C3 fill:#00f2fe,color:#333
style C4 fill:#00f2fe,color:#333
click A "../../data-assets/databases/table/" "Table"
click B "#column" "Column"
click C1 "#data-type" "Data Type" Click on any node to learn more about that entity.
Relationships¶
Columns have relationships with various entities in the metadata ecosystem:
graph TD
subgraph Column Structure
A[Column:<br/>customer_email] --> B1[DataType:<br/>VARCHAR 255]
A --> B2[Constraint:<br/>NOT NULL]
A --> B3[Constraint:<br/>UNIQUE]
end
subgraph Parent Entity
A -.->|belongs to| C1[Table:<br/>customers]
C1 -.->|in schema| C2[Schema:<br/>public]
C2 -.->|in database| C3[Database:<br/>production]
end
subgraph Relationships
A -.->|foreign key to| D1[Column:<br/>account.email]
A -.->|referenced by| D2[Column:<br/>orders.customer_email]
A -.->|part of index| D3[Index:<br/>idx_customer_email]
end
subgraph Semantics
A -.->|mapped to| E1[GlossaryTerm:<br/>Customer Email]
A -.->|tagged with| E2[Tag:<br/>PII.Email]
A -.->|tagged with| E3[Tag:<br/>GDPR]
A -.->|described by| E4[Description]
end
subgraph Lineage
A -.->|derived from| F1[Column:<br/>raw.user_email]
A -.->|feeds into| F2[Column:<br/>analytics.email]
A -.->|transformed by| F3[Query/Pipeline]
end
subgraph Quality & Profile
A -.->|tested by| G1[TestCase:<br/>Email Format Check]
A -.->|tested by| G2[TestCase:<br/>Uniqueness Check]
A -.->|profiled in| G3[ColumnProfile:<br/>Statistics]
end
subgraph Usage
A -.->|used in| H1[Query:<br/>Customer Report]
A -.->|filtered by| H2[Dashboard Filter]
A -.->|joined with| H3[Column:<br/>orders.id]
end
style A fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style B1 fill:#00f2fe,color:#333
style B2 fill:#00f2fe,color:#333
style B3 fill:#00f2fe,color:#333
style C1 fill:#764ba2,color:#fff
style C2 fill:#764ba2,color:#fff
style C3 fill:#764ba2,color:#fff
style D1 fill:#764ba2,color:#fff
style D2 fill:#764ba2,color:#fff
style D3 fill:#00f2fe,color:#333
style E1 fill:#f093fb,color:#333
style E2 fill:#fa709a,color:#fff
style E3 fill:#fa709a,color:#fff
style E4 fill:#ffd700,color:#333
style F1 fill:#764ba2,color:#fff
style F2 fill:#764ba2,color:#fff
style F3 fill:#667eea,color:#fff
style G1 fill:#f5576c,color:#fff
style G2 fill:#f5576c,color:#fff
style G3 fill:#ffd700,color:#333
style H1 fill:#00f2fe,color:#333
style H2 fill:#00f2fe,color:#333
style H3 fill:#764ba2,color:#fff
click A "#column" "Column"
click C1 "./table/" "Table"
click C2 "./database-schema/" "Schema"
click C3 "./database/" "Database"
click E1 "../../governance/glossary-term/" "Glossary Term"
click E2 "../../governance/tag/" "Tag"
click F3 "../../data-assets/pipelines/pipeline/" "Pipeline"
click G1 "../../data-quality/test-case/" "Test Case"
click G3 "../../data-quality/data-profile/" "Data Profile"
click H1 "./query/" "Query" Key Relationships:
- Column Structure: Data type, constraints, and properties
- Parent Entity: Table, Schema, and Database hierarchy
- Relationships: Foreign keys, references, and indexes
- Semantics: Glossary terms, tags, descriptions for business context
- Lineage: Source and target columns, transformation queries
- Quality & Profile: Test cases and statistical profiles
- Usage: Queries, dashboards, and joins using the column
Schema Specifications¶
{
"$id": "https://open-metadata.org/schema/entity/data/column.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Column",
"description": "A Column represents a field within a database table or view.",
"type": "object",
"javaType": "org.openmetadata.schema.entity.data.Column",
"definitions": {
"dataType": {
"description": "Data type of the column",
"javaType": "org.openmetadata.schema.type.ColumnDataType",
"type": "string",
"enum": [
"NUMBER",
"TINYINT",
"SMALLINT",
"INT",
"BIGINT",
"FLOAT",
"DOUBLE",
"DECIMAL",
"NUMERIC",
"TIMESTAMP",
"TIME",
"DATE",
"DATETIME",
"INTERVAL",
"STRING",
"MEDIUMTEXT",
"TEXT",
"CHAR",
"VARCHAR",
"BOOLEAN",
"BINARY",
"VARBINARY",
"ARRAY",
"BLOB",
"LONGBLOB",
"MEDIUMBLOB",
"MAP",
"STRUCT",
"UNION",
"SET",
"GEOGRAPHY",
"ENUM",
"JSON",
"UUID",
"BYTEA"
]
},
"constraint": {
"description": "Column constraint type",
"type": "string",
"enum": [
"NULL",
"NOT_NULL",
"UNIQUE",
"PRIMARY_KEY",
"FOREIGN_KEY"
]
},
"columnProfile": {
"type": "object",
"description": "Statistical profile of the column",
"properties": {
"name": {
"description": "Column name",
"type": "string"
},
"valuesCount": {
"description": "Total count of values in the column",
"type": "number"
},
"valuesPercentage": {
"description": "Percentage of values in the column",
"type": "number"
},
"validCount": {
"description": "Count of valid values",
"type": "number"
},
"duplicateCount": {
"description": "Count of duplicate values",
"type": "number"
},
"nullCount": {
"description": "Count of null values",
"type": "number"
},
"nullProportion": {
"description": "Proportion of null values",
"type": "number"
},
"uniqueCount": {
"description": "Count of unique values",
"type": "number"
},
"uniqueProportion": {
"description": "Proportion of unique values",
"type": "number"
},
"min": {
"description": "Minimum value",
"type": ["number", "string"]
},
"max": {
"description": "Maximum value",
"type": ["number", "string"]
},
"mean": {
"description": "Mean value for numeric columns",
"type": "number"
},
"median": {
"description": "Median value",
"type": "number"
},
"stddev": {
"description": "Standard deviation",
"type": "number"
},
"sum": {
"description": "Sum of values for numeric columns",
"type": "number"
},
"distinctCount": {
"description": "Count of distinct values",
"type": "number"
},
"distinctProportion": {
"description": "Proportion of distinct values",
"type": "number"
},
"histogram": {
"description": "Histogram of value distribution",
"type": "object"
},
"timestamp": {
"description": "Timestamp of the profile",
"$ref": "../../type/basic.json#/definitions/timestamp"
}
}
},
"dataTypeDisplay": {
"description": "Display name for the data type (e.g., varchar(100), decimal(10,2))",
"type": "string"
},
"columnConstraint": {
"description": "Column constraint",
"type": "object",
"properties": {
"constraintType": {
"$ref": "#/definitions/constraint"
},
"columns": {
"description": "List of columns involved in the constraint",
"type": "array",
"items": {
"type": "string"
}
}
}
}
},
"properties": {
"name": {
"description": "Name of the column",
"$ref": "../../type/basic.json#/definitions/entityName"
},
"displayName": {
"description": "Display name for the column",
"type": "string"
},
"dataType": {
"$ref": "#/definitions/dataType"
},
"dataTypeDisplay": {
"$ref": "#/definitions/dataTypeDisplay"
},
"arrayDataType": {
"description": "Data type of array elements for ARRAY type",
"$ref": "#/definitions/dataType"
},
"dataLength": {
"description": "Length of the data type (for CHAR, VARCHAR)",
"type": "integer"
},
"precision": {
"description": "Precision for numeric types",
"type": "integer"
},
"scale": {
"description": "Scale for numeric types",
"type": "integer"
},
"description": {
"description": "Description of the column",
"$ref": "../../type/basic.json#/definitions/markdown"
},
"fullyQualifiedName": {
"description": "Fully qualified name of the column",
"$ref": "../../type/basic.json#/definitions/fullyQualifiedEntityName"
},
"tags": {
"description": "Tags associated with the column",
"type": "array",
"items": {
"$ref": "../../type/tagLabel.json"
}
},
"constraint": {
"$ref": "#/definitions/constraint"
},
"ordinalPosition": {
"description": "Ordinal position of the column in the table",
"type": "integer"
},
"jsonSchema": {
"description": "JSON schema for JSON data type",
"type": "string"
},
"children": {
"description": "Child columns for nested types (STRUCT, MAP)",
"type": "array",
"items": {
"$ref": "#"
}
},
"customMetrics": {
"description": "Custom metrics for the column",
"type": "array",
"items": {
"$ref": "../../type/customProperty.json"
}
},
"profile": {
"$ref": "#/definitions/columnProfile"
}
},
"required": [
"name",
"dataType"
],
"additionalProperties": false
}
@prefix om: <https://open-metadata.org/schema/> .
@prefix om-entity: <https://open-metadata.org/schema/entity/> .
@prefix om-column: <https://open-metadata.org/schema/entity/data/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
# Column Class Definition
om-column:Column a owl:Class ;
rdfs:label "Column" ;
rdfs:comment "A column or field within a database table or view" ;
rdfs:subClassOf om-entity:Entity ;
rdfs:isDefinedBy om: .
# Data Type Class
om-column:DataType a owl:Class ;
rdfs:label "Data Type" ;
rdfs:comment "Data type of a column" ;
rdfs:isDefinedBy om: .
# Constraint Class
om-column:Constraint a owl:Class ;
rdfs:label "Constraint" ;
rdfs:comment "Constraint on a column" ;
rdfs:isDefinedBy om: .
# Column Profile Class
om-column:ColumnProfile a owl:Class ;
rdfs:label "Column Profile" ;
rdfs:comment "Statistical profile of a column" ;
rdfs:isDefinedBy om: .
# Properties
om-column:dataType a owl:ObjectProperty ;
rdfs:label "data type" ;
rdfs:comment "Data type of the column" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:DataType .
om-column:dataTypeDisplay a owl:DatatypeProperty ;
rdfs:label "data type display" ;
rdfs:comment "Display representation of the data type" ;
rdfs:domain om-column:Column ;
rdfs:range xsd:string .
om-column:dataLength a owl:DatatypeProperty ;
rdfs:label "data length" ;
rdfs:comment "Length of the data type" ;
rdfs:domain om-column:Column ;
rdfs:range xsd:integer .
om-column:precision a owl:DatatypeProperty ;
rdfs:label "precision" ;
rdfs:comment "Precision for numeric types" ;
rdfs:domain om-column:Column ;
rdfs:range xsd:integer .
om-column:scale a owl:DatatypeProperty ;
rdfs:label "scale" ;
rdfs:comment "Scale for numeric types" ;
rdfs:domain om-column:Column ;
rdfs:range xsd:integer .
om-column:constraint a owl:ObjectProperty ;
rdfs:label "constraint" ;
rdfs:comment "Constraint on the column" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:Constraint .
om-column:ordinalPosition a owl:DatatypeProperty ;
rdfs:label "ordinal position" ;
rdfs:comment "Position of the column in the table" ;
rdfs:domain om-column:Column ;
rdfs:range xsd:integer .
om-column:belongsTo a owl:ObjectProperty ;
rdfs:label "belongs to" ;
rdfs:comment "Table to which the column belongs" ;
rdfs:domain om-column:Column ;
rdfs:range om-entity:Table .
om-column:foreignKeyTo a owl:ObjectProperty ;
rdfs:label "foreign key to" ;
rdfs:comment "Column referenced by foreign key" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:Column .
om-column:derivedFrom a owl:ObjectProperty ;
rdfs:label "derived from" ;
rdfs:comment "Source column from which this column is derived" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:Column .
om-column:feedsInto a owl:ObjectProperty ;
rdfs:label "feeds into" ;
rdfs:comment "Target column that this column feeds" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:Column .
om-column:hasProfile a owl:ObjectProperty ;
rdfs:label "has profile" ;
rdfs:comment "Statistical profile of the column" ;
rdfs:domain om-column:Column ;
rdfs:range om-column:ColumnProfile .
om-column:testedBy a owl:ObjectProperty ;
rdfs:label "tested by" ;
rdfs:comment "Test cases validating the column" ;
rdfs:domain om-column:Column ;
rdfs:range om-entity:TestCase .
# Constraint Individuals
om-column:NULL a om-column:Constraint ;
rdfs:label "NULL" ;
skos:definition "Column allows NULL values" .
om-column:NOT_NULL a om-column:Constraint ;
rdfs:label "NOT NULL" ;
skos:definition "Column does not allow NULL values" .
om-column:UNIQUE a om-column:Constraint ;
rdfs:label "UNIQUE" ;
skos:definition "Column values must be unique" .
om-column:PRIMARY_KEY a om-column:Constraint ;
rdfs:label "PRIMARY KEY" ;
skos:definition "Column is part of the primary key" .
om-column:FOREIGN_KEY a om-column:Constraint ;
rdfs:label "FOREIGN KEY" ;
skos:definition "Column is a foreign key reference" .
{
"@context": {
"@vocab": "https://open-metadata.org/schema/entity/data/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"owl": "http://www.w3.org/2002/07/owl#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"dcterms": "http://purl.org/dc/terms/",
"skos": "http://www.w3.org/2004/02/skos/core#",
"om": "https://open-metadata.org/schema/",
"Column": {
"@id": "om:Column",
"@type": "@id"
},
"name": {
"@id": "om:name",
"@type": "xsd:string"
},
"displayName": {
"@id": "om:displayName",
"@type": "xsd:string"
},
"description": {
"@id": "dcterms:description",
"@type": "xsd:string"
},
"dataType": {
"@id": "om:dataType",
"@type": "@id"
},
"dataTypeDisplay": {
"@id": "om:dataTypeDisplay",
"@type": "xsd:string"
},
"dataLength": {
"@id": "om:dataLength",
"@type": "xsd:integer"
},
"precision": {
"@id": "om:precision",
"@type": "xsd:integer"
},
"scale": {
"@id": "om:scale",
"@type": "xsd:integer"
},
"constraint": {
"@id": "om:constraint",
"@type": "@id"
},
"ordinalPosition": {
"@id": "om:ordinalPosition",
"@type": "xsd:integer"
},
"belongsTo": {
"@id": "om:belongsTo",
"@type": "@id"
},
"foreignKeyTo": {
"@id": "om:foreignKeyTo",
"@type": "@id"
},
"derivedFrom": {
"@id": "om:derivedFrom",
"@type": "@id"
},
"feedsInto": {
"@id": "om:feedsInto",
"@type": "@id"
},
"tags": {
"@id": "om:tags",
"@type": "@id",
"@container": "@set"
},
"profile": {
"@id": "om:hasProfile",
"@type": "@id"
},
"children": {
"@id": "om:hasChild",
"@type": "@id",
"@container": "@set"
}
}
}
Use Cases¶
Simple Column Definition¶
A basic varchar column with NOT NULL constraint:
{
"name": "customer_email",
"displayName": "Customer Email",
"dataType": "VARCHAR",
"dataLength": 255,
"constraint": "NOT_NULL",
"ordinalPosition": 3,
"description": "Email address of the customer",
"tags": [
{
"tagFQN": "PII.Email"
},
{
"tagFQN": "Compliance.GDPR"
}
]
}
Primary Key Column¶
An integer primary key column:
{
"name": "customer_id",
"displayName": "Customer ID",
"dataType": "BIGINT",
"constraint": "PRIMARY_KEY",
"ordinalPosition": 1,
"description": "Unique identifier for customers"
}
Decimal Column with Precision¶
A decimal column for monetary values:
{
"name": "total_amount",
"displayName": "Total Amount",
"dataType": "DECIMAL",
"dataTypeDisplay": "decimal(10,2)",
"precision": 10,
"scale": 2,
"constraint": "NOT_NULL",
"ordinalPosition": 5,
"description": "Total order amount in USD",
"tags": [
{
"tagFQN": "BusinessCritical"
}
]
}
JSON Column¶
A JSON column with schema definition:
{
"name": "customer_metadata",
"displayName": "Customer Metadata",
"dataType": "JSON",
"ordinalPosition": 10,
"description": "Additional customer attributes in JSON format",
"jsonSchema": "{\"type\": \"object\", \"properties\": {\"preferences\": {\"type\": \"object\"}, \"tags\": {\"type\": \"array\"}}}",
"tags": [
{
"tagFQN": "SemiStructured"
}
]
}
Nested Struct Column¶
A struct/nested column (common in data warehouses):
{
"name": "address",
"displayName": "Address",
"dataType": "STRUCT",
"dataTypeDisplay": "struct<street:string, city:string, state:string, zip:string>",
"ordinalPosition": 4,
"description": "Customer mailing address",
"children": [
{
"name": "street",
"dataType": "STRING",
"ordinalPosition": 1
},
{
"name": "city",
"dataType": "STRING",
"ordinalPosition": 2
},
{
"name": "state",
"dataType": "STRING",
"dataLength": 2,
"ordinalPosition": 3
},
{
"name": "zip",
"dataType": "STRING",
"dataLength": 10,
"ordinalPosition": 4
}
],
"tags": [
{
"tagFQN": "PII.Address"
}
]
}
Column with Profile¶
A column with statistical profile:
{
"name": "age",
"displayName": "Age",
"dataType": "INT",
"ordinalPosition": 6,
"description": "Customer age in years",
"profile": {
"name": "age",
"valuesCount": 1000000,
"nullCount": 5000,
"nullProportion": 0.005,
"uniqueCount": 80,
"distinctCount": 80,
"min": 18,
"max": 95,
"mean": 42.5,
"median": 41,
"stddev": 15.2,
"timestamp": 1705320000000
}
}
Data Types¶
Numeric Types¶
NUMBER,TINYINT,SMALLINT,INT,BIGINTFLOAT,DOUBLE,DECIMAL,NUMERIC
String Types¶
STRING,TEXT,MEDIUMTEXTCHAR,VARCHAR
Date/Time Types¶
DATE,TIME,DATETIME,TIMESTAMPINTERVAL
Boolean¶
BOOLEAN
Binary Types¶
BINARY,VARBINARY,BLOB,LONGBLOB,MEDIUMBLOB,BYTEA
Complex Types¶
ARRAY: Array of valuesMAP: Key-value pairsSTRUCT: Nested structureUNION: Union typeJSON: JSON documents
Special Types¶
UUID: Universally unique identifierGEOGRAPHY: Geographic coordinatesENUM: Enumerated valuesSET: Set of values
Column Constraints¶
| Constraint | Description | Example |
|---|---|---|
| NULL | Column allows NULL values | Default for most columns |
| NOT_NULL | Column cannot be NULL | Required fields |
| UNIQUE | All values must be unique | Email addresses |
| PRIMARY_KEY | Uniquely identifies rows | Customer ID |
| FOREIGN_KEY | References another table's primary key | Order customer_id → Customer id |
Column Profiling¶
Column profiles provide statistical insights:
{
"profile": {
"valuesCount": 1000000,
"validCount": 995000,
"nullCount": 5000,
"nullProportion": 0.005,
"uniqueCount": 50000,
"uniqueProportion": 0.05,
"distinctCount": 50000,
"distinctProportion": 0.05,
"duplicateCount": 945000,
"min": 1,
"max": 999999,
"mean": 500000,
"median": 498765,
"stddev": 288675,
"sum": 500000000000,
"timestamp": 1705320000000
}
}
Column-Level Lineage¶
Columns can have lineage relationships showing data transformations:
graph LR
A[source.raw_data.user_email] -->|UPPER TRIM| B[staging.users.email]
B -->|VALIDATE| C[production.customers.email]
C -->|AGGREGATE| D[analytics.customer_summary.email]
style A fill:#764ba2,color:#fff
style B fill:#764ba2,color:#fff
style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style D fill:#764ba2,color:#fff Best Practices¶
1. Descriptive Names¶
Use clear, descriptive column names that convey business meaning.
2. Data Type Precision¶
Choose appropriate data types with correct precision and scale.
3. Tag PII Data¶
Tag all personally identifiable information for compliance.
4. Document Business Logic¶
Provide clear descriptions explaining column purpose and business rules.
5. Map to Glossary¶
Link columns to glossary terms for semantic understanding.
6. Profile Regularly¶
Run column profiling to understand data distributions and quality.
7. Test Data Quality¶
Create test cases for critical columns (nullability, uniqueness, format).
8. Track Lineage¶
Maintain column-level lineage for impact analysis.
Custom Properties¶
This entity supports custom properties through the extension field. Common custom properties include:
- Data Classification: Sensitivity level
- Cost Center: Billing allocation
- Retention Period: Data retention requirements
- Application Owner: Owning application/team
See Custom Properties for details on defining and using custom properties.
Related Entities¶
- Table: Parent table containing the column
- Database Schema: Schema containing the table
- Database: Database containing the schema
- Glossary Term: Business terms mapped to columns
- Tag: Tags for classification and compliance
- TestCase: Quality tests for columns
- DataProfile: Statistical profiles of columns
- Query: Queries accessing the column