File¶
A File represents a document or resource stored in a Drive Service such as Google Drive, OneDrive, or other file sharing platforms. Files can be documents, spreadsheets, presentations, images, videos, PDFs, and other content types.
Overview¶
Files in OpenMetadata capture metadata about:
- File Properties: Name, path, size, MIME type, file extension
- File Types: Documents, Spreadsheets, Presentations, Images, Videos, Audio, PDF, CSV, Code, Data
- Storage Location: Drive service, parent directory, full path
- Access Information: Sharing status, web view links, download links
- Version Information: File version, modification timestamps
- File Metadata: Checksum, creation/modification times, last modified by user
Files are essential for: - Document management and collaboration - File sharing and access control - Compliance and data governance - Knowledge management - Content organization and discovery
Hierarchy¶
graph LR
A[Drive Service] --> B[Directory]
B --> C[File]
C --> D1[File Type]
C --> D2[Version Info]
C --> D3[Tags]
style A fill:#667eea,color:#fff
style B fill:#764ba2,color:#fff
style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style D1 fill:#00f2fe,color:#333
style D2 fill:#00f2fe,color:#333
style D3 fill:#00f2fe,color:#333
click A "./drive-service/" "Drive Service"
click B "./directory/" "Directory"
click C "#file" "File" Click on any node to learn more about that entity.
Relationships¶
Files have relationships with various entities in the metadata ecosystem:
graph TD
subgraph File Properties
A[File:<br/>Q4_Report.xlsx] --> B1[FileType:<br/>Spreadsheet]
A --> B2[Size:<br/>15 MB]
A --> B3[MimeType:<br/>application/xlsx]
A --> B4[Version:<br/>v2.1]
end
subgraph Storage Location
A -.->|stored in| C1[Directory:<br/>Reports/Q4]
C1 -.->|in service| C2[DriveService:<br/>Google Drive]
A -.->|path| C3[Path:<br/>/shared/reports/Q4_Report.xlsx]
end
subgraph Access & Sharing
A -.->|web view| D1[WebViewLink]
A -.->|download| D2[DownloadLink]
A -.->|shared| D3[IsShared:<br/>true]
end
subgraph Governance
A -.->|owned by| E1[User:<br/>analyst]
A -.->|tagged with| E2[Tag:<br/>Confidential]
A -.->|in domain| E3[Domain:<br/>Finance]
A -.->|followed by| E4[Followers]
end
subgraph Metadata & Version
A -.->|created at| F1[CreatedTime]
A -.->|modified at| F2[ModifiedTime]
A -.->|modified by| F3[User:<br/>editor]
A -.->|checksum| F4[Hash:<br/>abc123...]
end
subgraph Lifecycle
A -.->|status| G1[EntityStatus]
A -.->|lifecycle| G2[LifeCycle]
A -.->|certification| G3[Certification]
A -.->|usage| G4[UsageSummary]
end
style A fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style B1 fill:#00f2fe,color:#333
style B2 fill:#00f2fe,color:#333
style B3 fill:#00f2fe,color:#333
style B4 fill:#00f2fe,color:#333
style C1 fill:#764ba2,color:#fff
style C2 fill:#667eea,color:#fff
style C3 fill:#ffd700,color:#333
style D1 fill:#00f2fe,color:#333
style D2 fill:#00f2fe,color:#333
style D3 fill:#00f2fe,color:#333
style E1 fill:#43e97b,color:#333
style E2 fill:#fa709a,color:#fff
style E3 fill:#f093fb,color:#333
style E4 fill:#43e97b,color:#333
style F1 fill:#ffd700,color:#333
style F2 fill:#ffd700,color:#333
style F3 fill:#43e97b,color:#333
style F4 fill:#ffd700,color:#333
style G1 fill:#f5576c,color:#fff
style G2 fill:#f5576c,color:#fff
style G3 fill:#f5576c,color:#fff
style G4 fill:#ffd700,color:#333
click A "#file" "File"
click C1 "./directory/" "Directory"
click C2 "./drive-service/" "Drive Service"
click E1 "../../teams-users/user/" "User"
click E2 "../../governance/tag/" "Tag"
click E3 "../../domains/domain/" "Domain" Key Relationships:
- File Properties: File type, MIME type, size, and version information
- Storage Location: Directory, drive service, and full path
- Access & Sharing: Web view links, download links, and sharing status
- Governance: Ownership, tags, domains, and followers
- Metadata & Version: Creation/modification timestamps, modified by user, checksum
- Lifecycle: Entity status, lifecycle stages, certification, and usage tracking
Schema Specifications¶
{
"$id": "https://open-metadata.org/schema/entity/data/file.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "File",
"$comment": "@om-entity-type",
"description": "This schema defines the File entity. A File is a document or resource stored in a Drive Service.",
"type": "object",
"javaType": "org.openmetadata.schema.entity.data.File",
"javaInterfaces": [
"org.openmetadata.schema.EntityInterface"
],
"definitions": {
"fileType": {
"javaType": "org.openmetadata.schema.type.FileType",
"description": "Type of file based on content",
"type": "string",
"enum": [
"Document",
"Spreadsheet",
"Presentation",
"Image",
"Video",
"Audio",
"PDF",
"Archive",
"Code",
"Data",
"CSV",
"Text",
"Other"
]
}
},
"properties": {
"id": {
"description": "Unique identifier of this file instance.",
"$ref": "../../type/basic.json#/definitions/uuid"
},
"name": {
"description": "Name of the file.",
"$ref": "../../type/basic.json#/definitions/entityName"
},
"fullyQualifiedName": {
"description": "Fully qualified name of the file.",
"$ref": "../../type/basic.json#/definitions/fullyQualifiedEntityName"
},
"displayName": {
"description": "Display Name that identifies this file.",
"type": "string"
},
"description": {
"description": "Description of the file.",
"$ref": "../../type/basic.json#/definitions/markdown"
},
"service": {
"description": "Link to the drive service",
"$ref": "../../type/entityReference.json"
},
"serviceType": {
"description": "Type of drive service",
"$ref": "../../entity/services/driveService.json#/definitions/driveServiceType"
},
"directory": {
"description": "Parent directory containing this file",
"$ref": "../../type/entityReference.json"
},
"fileType": {
"$ref": "#/definitions/fileType"
},
"mimeType": {
"description": "MIME type of the file",
"type": "string"
},
"fileExtension": {
"description": "File extension",
"type": "string"
},
"path": {
"description": "Full path to the file",
"type": "string"
},
"size": {
"description": "File size in bytes",
"type": "integer"
},
"checksum": {
"description": "File checksum/hash",
"type": "string"
},
"webViewLink": {
"description": "Web link to view the file",
"type": "string",
"format": "uri"
},
"downloadLink": {
"description": "Direct download link",
"type": "string",
"format": "uri"
},
"isShared": {
"description": "Whether this file is shared",
"type": "boolean",
"default": false
},
"fileVersion": {
"description": "File version information",
"type": "string"
},
"createdTime": {
"description": "File creation timestamp",
"$ref": "../../type/basic.json#/definitions/timestamp"
},
"modifiedTime": {
"description": "Last modification timestamp",
"$ref": "../../type/basic.json#/definitions/timestamp"
},
"lastModifiedBy": {
"description": "User who last modified the file",
"$ref": "../../type/entityReference.json"
},
"sourceUrl": {
"description": "Link to this file in the source system.",
"$ref": "../../type/basic.json#/definitions/sourceUrl"
},
"href": {
"description": "Link to the resource corresponding to this file.",
"$ref": "../../type/basic.json#/definitions/href"
},
"owners": {
"description": "Owners of this file.",
"$ref": "../../type/entityReferenceList.json"
},
"followers": {
"description": "Followers of this entity.",
"$ref": "../../type/entityReferenceList.json"
},
"tags": {
"description": "Tags associated with this file.",
"type": "array",
"items": {
"$ref": "../../type/tagLabel.json"
},
"default": null
},
"version": {
"description": "Metadata version of the entity.",
"$ref": "../../type/entityHistory.json#/definitions/entityVersion"
},
"updatedAt": {
"description": "Last update time corresponding to the new version of the entity in Unix epoch time milliseconds.",
"$ref": "../../type/basic.json#/definitions/timestamp"
},
"updatedBy": {
"description": "User who made the update.",
"type": "string"
},
"impersonatedBy": {
"description": "Bot user that performed the action on behalf of the actual user.",
"$ref": "../../type/basic.json#/definitions/impersonatedBy"
},
"changeDescription": {
"description": "Change that lead to this version of the entity.",
"$ref": "../../type/entityHistory.json#/definitions/changeDescription"
},
"incrementalChangeDescription": {
"description": "Change that lead to this version of the entity.",
"$ref": "../../type/entityHistory.json#/definitions/changeDescription"
},
"deleted": {
"description": "When `true` indicates the entity has been soft deleted.",
"type": "boolean",
"default": false
},
"domains": {
"description": "Domains the File belongs to.",
"$ref": "../../type/entityReferenceList.json"
},
"dataProducts": {
"description": "List of data products this entity is part of.",
"$ref": "../../type/entityReferenceList.json"
},
"lifeCycle": {
"description": "Life Cycle of the entity",
"$ref": "../../type/lifeCycle.json"
},
"sourceHash": {
"description": "Source hash of the entity",
"type": "string",
"minLength": 1,
"maxLength": 32
},
"extension": {
"description": "Entity extension data with custom attributes added to the entity.",
"$ref": "../../type/basic.json#/definitions/entityExtension"
},
"votes": {
"description": "Votes on the entity.",
"$ref": "../../type/votes.json"
},
"certification": {
"$ref": "../../type/assetCertification.json"
},
"usageSummary": {
"description": "Latest usage information for this file.",
"$ref": "../../type/usageDetails.json",
"default": null
},
"entityStatus": {
"description": "Status of the File.",
"$ref": "../../type/status.json"
}
},
"required": ["id", "name", "service"],
"additionalProperties": false
}
@prefix om: <https://open-metadata.org/schema/> .
@prefix om-entity: <https://open-metadata.org/schema/entity/> .
@prefix om-file: <https://open-metadata.org/schema/entity/data/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
# File Class Definition
om-file:File a owl:Class ;
rdfs:label "File" ;
rdfs:comment "A document or resource stored in a Drive Service" ;
rdfs:subClassOf dcat:Distribution ;
rdfs:subClassOf om-entity:Entity ;
rdfs:isDefinedBy om: .
# File Type Class
om-file:FileType a owl:Class ;
rdfs:label "File Type" ;
rdfs:comment "Type of file based on content (Document, Spreadsheet, etc.)" ;
rdfs:isDefinedBy om: .
# Properties
om-file:fileType a owl:ObjectProperty ;
rdfs:label "file type" ;
rdfs:comment "Type of the file" ;
rdfs:domain om-file:File ;
rdfs:range om-file:FileType .
om-file:mimeType a owl:DatatypeProperty ;
rdfs:label "MIME type" ;
rdfs:comment "MIME type of the file" ;
rdfs:domain om-file:File ;
rdfs:range xsd:string ;
rdfs:subPropertyOf dcat:mediaType .
om-file:fileExtension a owl:DatatypeProperty ;
rdfs:label "file extension" ;
rdfs:comment "File extension" ;
rdfs:domain om-file:File ;
rdfs:range xsd:string .
om-file:path a owl:DatatypeProperty ;
rdfs:label "path" ;
rdfs:comment "Full path to the file" ;
rdfs:domain om-file:File ;
rdfs:range xsd:string .
om-file:size a owl:DatatypeProperty ;
rdfs:label "size" ;
rdfs:comment "File size in bytes" ;
rdfs:domain om-file:File ;
rdfs:range xsd:integer ;
rdfs:subPropertyOf dcat:byteSize .
om-file:checksum a owl:DatatypeProperty ;
rdfs:label "checksum" ;
rdfs:comment "File checksum/hash" ;
rdfs:domain om-file:File ;
rdfs:range xsd:string .
om-file:webViewLink a owl:DatatypeProperty ;
rdfs:label "web view link" ;
rdfs:comment "Web link to view the file" ;
rdfs:domain om-file:File ;
rdfs:range xsd:anyURI ;
rdfs:subPropertyOf dcat:accessURL .
om-file:downloadLink a owl:DatatypeProperty ;
rdfs:label "download link" ;
rdfs:comment "Direct download link" ;
rdfs:domain om-file:File ;
rdfs:range xsd:anyURI ;
rdfs:subPropertyOf dcat:downloadURL .
om-file:isShared a owl:DatatypeProperty ;
rdfs:label "is shared" ;
rdfs:comment "Whether this file is shared" ;
rdfs:domain om-file:File ;
rdfs:range xsd:boolean .
om-file:fileVersion a owl:DatatypeProperty ;
rdfs:label "file version" ;
rdfs:comment "File version information" ;
rdfs:domain om-file:File ;
rdfs:range xsd:string .
om-file:hasService a owl:ObjectProperty ;
rdfs:label "has service" ;
rdfs:comment "Drive service storing the file" ;
rdfs:domain om-file:File ;
rdfs:range om-entity:DriveService .
om-file:hasDirectory a owl:ObjectProperty ;
rdfs:label "has directory" ;
rdfs:comment "Parent directory containing the file" ;
rdfs:domain om-file:File ;
rdfs:range om-entity:Directory .
om-file:createdTime a owl:DatatypeProperty ;
rdfs:label "created time" ;
rdfs:comment "File creation timestamp" ;
rdfs:domain om-file:File ;
rdfs:range xsd:dateTime ;
rdfs:subPropertyOf dcterms:created .
om-file:modifiedTime a owl:DatatypeProperty ;
rdfs:label "modified time" ;
rdfs:comment "Last modification timestamp" ;
rdfs:domain om-file:File ;
rdfs:range xsd:dateTime ;
rdfs:subPropertyOf dcterms:modified .
om-file:lastModifiedBy a owl:ObjectProperty ;
rdfs:label "last modified by" ;
rdfs:comment "User who last modified the file" ;
rdfs:domain om-file:File ;
rdfs:range om-entity:User .
# File Type Individuals
om-file:Document a om-file:FileType ;
rdfs:label "Document" ;
skos:definition "Text document file" .
om-file:Spreadsheet a om-file:FileType ;
rdfs:label "Spreadsheet" ;
skos:definition "Spreadsheet file" .
om-file:Presentation a om-file:FileType ;
rdfs:label "Presentation" ;
skos:definition "Presentation file" .
om-file:Image a om-file:FileType ;
rdfs:label "Image" ;
skos:definition "Image file" .
om-file:Video a om-file:FileType ;
rdfs:label "Video" ;
skos:definition "Video file" .
om-file:Audio a om-file:FileType ;
rdfs:label "Audio" ;
skos:definition "Audio file" .
om-file:PDF a om-file:FileType ;
rdfs:label "PDF" ;
skos:definition "Portable Document Format file" .
om-file:CSV a om-file:FileType ;
rdfs:label "CSV" ;
skos:definition "Comma-separated values file" .
om-file:Code a om-file:FileType ;
rdfs:label "Code" ;
skos:definition "Source code file" .
om-file:Data a om-file:FileType ;
rdfs:label "Data" ;
skos:definition "Data file" .
{
"@context": {
"@vocab": "https://open-metadata.org/schema/entity/data/",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"owl": "http://www.w3.org/2002/07/owl#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"dcterms": "http://purl.org/dc/terms/",
"dcat": "http://www.w3.org/ns/dcat#",
"skos": "http://www.w3.org/2004/02/skos/core#",
"om": "https://open-metadata.org/schema/",
"File": {
"@id": "om:File",
"@type": "@id"
},
"id": {
"@id": "om:id",
"@type": "xsd:string"
},
"name": {
"@id": "om:name",
"@type": "xsd:string"
},
"fullyQualifiedName": {
"@id": "om:fullyQualifiedName",
"@type": "xsd:string"
},
"displayName": {
"@id": "om:displayName",
"@type": "xsd:string"
},
"description": {
"@id": "dcterms:description",
"@type": "xsd:string"
},
"service": {
"@id": "om:hasService",
"@type": "@id"
},
"serviceType": {
"@id": "om:serviceType",
"@type": "xsd:string"
},
"directory": {
"@id": "om:hasDirectory",
"@type": "@id"
},
"fileType": {
"@id": "om:fileType",
"@type": "@id"
},
"mimeType": {
"@id": "dcat:mediaType",
"@type": "xsd:string"
},
"fileExtension": {
"@id": "om:fileExtension",
"@type": "xsd:string"
},
"path": {
"@id": "om:path",
"@type": "xsd:string"
},
"size": {
"@id": "dcat:byteSize",
"@type": "xsd:integer"
},
"checksum": {
"@id": "om:checksum",
"@type": "xsd:string"
},
"webViewLink": {
"@id": "dcat:accessURL",
"@type": "xsd:anyURI"
},
"downloadLink": {
"@id": "dcat:downloadURL",
"@type": "xsd:anyURI"
},
"isShared": {
"@id": "om:isShared",
"@type": "xsd:boolean"
},
"fileVersion": {
"@id": "om:fileVersion",
"@type": "xsd:string"
},
"createdTime": {
"@id": "dcterms:created",
"@type": "xsd:dateTime"
},
"modifiedTime": {
"@id": "dcterms:modified",
"@type": "xsd:dateTime"
},
"lastModifiedBy": {
"@id": "om:lastModifiedBy",
"@type": "@id"
},
"sourceUrl": {
"@id": "om:sourceUrl",
"@type": "xsd:anyURI"
},
"href": {
"@id": "om:href",
"@type": "xsd:anyURI"
},
"owners": {
"@id": "om:owners",
"@type": "@id",
"@container": "@set"
},
"followers": {
"@id": "om:followers",
"@type": "@id",
"@container": "@set"
},
"tags": {
"@id": "om:tags",
"@type": "@id",
"@container": "@set"
},
"domains": {
"@id": "om:domains",
"@type": "@id",
"@container": "@set"
},
"dataProducts": {
"@id": "om:dataProducts",
"@type": "@id",
"@container": "@set"
},
"lifeCycle": {
"@id": "om:lifeCycle",
"@type": "@id"
},
"certification": {
"@id": "om:certification",
"@type": "@id"
},
"usageSummary": {
"@id": "om:usageSummary",
"@type": "@id"
},
"entityStatus": {
"@id": "om:entityStatus",
"@type": "xsd:string"
}
}
}
Use Cases¶
Spreadsheet in Google Drive¶
{
"id": "123e4567-e89b-12d3-a456-426614174000",
"name": "Q4_Sales_Report.xlsx",
"fullyQualifiedName": "google_drive.Finance/Reports/Q4_Sales_Report.xlsx",
"displayName": "Q4 Sales Report",
"description": "Quarterly sales report for Q4 2024",
"service": {
"id": "abc-123",
"type": "driveService",
"name": "GoogleDrive"
},
"serviceType": "GoogleDrive",
"directory": {
"id": "dir-456",
"type": "directory",
"name": "Finance/Reports"
},
"fileType": "Spreadsheet",
"mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"fileExtension": "xlsx",
"path": "/Finance/Reports/Q4_Sales_Report.xlsx",
"size": 15728640,
"checksum": "md5:abc123def456...",
"webViewLink": "https://docs.google.com/spreadsheets/d/abc123/edit",
"downloadLink": "https://docs.google.com/spreadsheets/d/abc123/export",
"isShared": true,
"fileVersion": "v2.1",
"createdTime": 1705320000000,
"modifiedTime": 1705406400000,
"lastModifiedBy": {
"id": "user-789",
"type": "user",
"name": "finance.analyst"
},
"owners": [
{
"id": "team-456",
"type": "team",
"name": "Finance"
}
],
"tags": [
{
"tagFQN": "Confidential"
},
{
"tagFQN": "QuarterlyReport"
}
],
"domains": [
{
"id": "domain-123",
"type": "domain",
"name": "Finance"
}
]
}
Document in OneDrive¶
{
"id": "456e7890-e89b-12d3-a456-426614174111",
"name": "Product_Requirements.docx",
"fullyQualifiedName": "onedrive.Engineering/Product/Product_Requirements.docx",
"displayName": "Product Requirements Document",
"description": "Detailed requirements for the new product launch",
"service": {
"id": "onedrive-001",
"type": "driveService",
"name": "OneDrive"
},
"serviceType": "OneDrive",
"directory": {
"id": "dir-789",
"type": "directory",
"name": "Engineering/Product"
},
"fileType": "Document",
"mimeType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"fileExtension": "docx",
"path": "/Engineering/Product/Product_Requirements.docx",
"size": 2097152,
"checksum": "sha256:xyz789abc123...",
"webViewLink": "https://onedrive.live.com/view.aspx?id=xyz789",
"downloadLink": "https://onedrive.live.com/download?id=xyz789",
"isShared": false,
"fileVersion": "3.0",
"createdTime": 1705233600000,
"modifiedTime": 1705492800000,
"lastModifiedBy": {
"id": "user-456",
"type": "user",
"name": "product.manager"
},
"owners": [
{
"id": "user-456",
"type": "user",
"name": "product.manager"
}
],
"tags": [
{
"tagFQN": "ProductManagement"
}
]
}
Presentation in Google Drive¶
{
"id": "789e0123-e89b-12d3-a456-426614174222",
"name": "Company_Overview_2024.pptx",
"fullyQualifiedName": "google_drive.Marketing/Presentations/Company_Overview_2024.pptx",
"displayName": "Company Overview 2024",
"description": "Annual company overview presentation for stakeholders",
"service": {
"id": "gdrive-123",
"type": "driveService",
"name": "GoogleDrive"
},
"serviceType": "GoogleDrive",
"directory": {
"id": "dir-marketing",
"type": "directory",
"name": "Marketing/Presentations"
},
"fileType": "Presentation",
"mimeType": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
"fileExtension": "pptx",
"path": "/Marketing/Presentations/Company_Overview_2024.pptx",
"size": 8388608,
"webViewLink": "https://docs.google.com/presentation/d/xyz789/edit",
"downloadLink": "https://docs.google.com/presentation/d/xyz789/export",
"isShared": true,
"fileVersion": "1.5",
"createdTime": 1705233600000,
"modifiedTime": 1705492800000,
"lastModifiedBy": {
"id": "user-marketing",
"type": "user",
"name": "marketing.lead"
},
"owners": [
{
"id": "team-marketing",
"type": "team",
"name": "Marketing"
}
],
"followers": [
{
"id": "user-ceo",
"type": "user",
"name": "ceo"
}
],
"tags": [
{
"tagFQN": "Public"
},
{
"tagFQN": "AnnualReport"
}
]
}
PDF Document in SharePoint¶
{
"id": "abc12345-e89b-12d3-a456-426614174333",
"name": "Data_Governance_Policy_v2.pdf",
"fullyQualifiedName": "sharepoint.Governance/Policies/Data_Governance_Policy_v2.pdf",
"displayName": "Data Governance Policy v2.0",
"description": "Company-wide data governance policy and procedures",
"service": {
"id": "sharepoint-001",
"type": "driveService",
"name": "SharePoint"
},
"serviceType": "SharePoint",
"directory": {
"id": "dir-policies",
"type": "directory",
"name": "Governance/Policies"
},
"fileType": "PDF",
"mimeType": "application/pdf",
"fileExtension": "pdf",
"path": "/Governance/Policies/Data_Governance_Policy_v2.pdf",
"size": 2097152,
"checksum": "sha256:policy123hash456...",
"webViewLink": "https://company.sharepoint.com/sites/governance/Data_Governance_Policy_v2.pdf",
"downloadLink": "https://company.sharepoint.com/sites/governance/_layouts/download.aspx?SourceUrl=Data_Governance_Policy_v2.pdf",
"isShared": true,
"fileVersion": "2.0",
"createdTime": 1705320000000,
"modifiedTime": 1705579200000,
"lastModifiedBy": {
"id": "user-compliance",
"type": "user",
"name": "compliance.officer"
},
"owners": [
{
"id": "user-compliance",
"type": "user",
"name": "compliance.officer"
}
],
"tags": [
{
"tagFQN": "Governance.Policy"
},
{
"tagFQN": "Official"
}
],
"domains": [
{
"id": "domain-governance",
"type": "domain",
"name": "Governance"
}
],
"certification": {
"tagLabel": "Certified"
}
}
CSV Data File in Google Drive¶
{
"id": "def45678-e89b-12d3-a456-426614174444",
"name": "customer_export_2024.csv",
"fullyQualifiedName": "google_drive.Data/Exports/customer_export_2024.csv",
"displayName": "Customer Data Export 2024",
"description": "Customer data export for analysis",
"service": {
"id": "gdrive-data",
"type": "driveService",
"name": "GoogleDrive"
},
"serviceType": "GoogleDrive",
"directory": {
"id": "dir-exports",
"type": "directory",
"name": "Data/Exports"
},
"fileType": "CSV",
"mimeType": "text/csv",
"fileExtension": "csv",
"path": "/Data/Exports/customer_export_2024.csv",
"size": 5242880,
"checksum": "md5:csv123hash456...",
"webViewLink": "https://drive.google.com/file/d/csv789/view",
"downloadLink": "https://drive.google.com/uc?export=download&id=csv789",
"isShared": false,
"fileVersion": "1.0",
"createdTime": 1705492800000,
"modifiedTime": 1705492800000,
"lastModifiedBy": {
"id": "user-data-eng",
"type": "user",
"name": "data.engineer"
},
"owners": [
{
"id": "team-data",
"type": "team",
"name": "DataEngineering"
}
],
"tags": [
{
"tagFQN": "PII.Sensitive"
},
{
"tagFQN": "CustomerData"
}
],
"domains": [
{
"id": "domain-customer",
"type": "domain",
"name": "CustomerData"
}
]
}
File Types¶
| Type | Extension | Description | Use Case |
|---|---|---|---|
| Document | .doc, .docx, .txt | Text documents | Requirements, reports, documentation |
| Spreadsheet | .xlsx, .xls, .gsheet | Spreadsheets | Data analysis, financial reports |
| Presentation | .pptx, .ppt, .gslides | Presentations | Slides, pitches, overviews |
| Image | .jpg, .png, .gif | Images | Graphics, photos, diagrams |
| Video | .mp4, .mov, .avi | Videos | Training videos, recordings |
| Audio | .mp3, .wav, .m4a | Audio files | Podcasts, voice recordings |
| Portable documents | Reports, contracts, policies | ||
| Archive | .zip, .tar, .rar | Compressed archives | Backups, file collections |
| Code | .py, .js, .java | Source code | Scripts, programs |
| Data | .dat, .bin | Data files | Binary data, datasets |
| CSV | .csv | Comma-separated values | Data exports, tabular data |
| Text | .txt, .md | Plain text | Notes, markdown files |
| Other | various | Other file types | Miscellaneous files |
Best Practices¶
1. Consistent Naming Conventions¶
Use descriptive, consistent file names:
2. Include Metadata¶
Add comprehensive metadata for discoverability:
{
"displayName": "Q4 Sales Report 2024",
"description": "Quarterly sales analysis and forecasting for Q4 2024",
"tags": [{"tagFQN": "Confidential"}, {"tagFQN": "QuarterlyReport"}],
"owners": [{"type": "team", "name": "Finance"}],
"domains": [{"type": "domain", "name": "Finance"}]
}
3. Track File Versions¶
Maintain version information for important documents:
{
"fileVersion": "v2.1",
"createdTime": 1705320000000,
"modifiedTime": 1705406400000,
"lastModifiedBy": {
"type": "user",
"name": "finance.analyst"
}
}
4. Manage Sharing and Access¶
Control file sharing appropriately:
{
"isShared": true,
"webViewLink": "https://docs.google.com/spreadsheets/d/abc123/edit",
"downloadLink": "https://docs.google.com/spreadsheets/d/abc123/export",
"tags": [{"tagFQN": "Confidential"}]
}
5. Use Proper File Types¶
Classify files correctly for better organization:
{
"fileType": "Spreadsheet",
"mimeType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"fileExtension": "xlsx"
}
6. Organize with Directories¶
Use proper directory structures:
{
"directory": {
"type": "directory",
"name": "Finance/Reports/Quarterly"
},
"path": "/Finance/Reports/Quarterly/Q4_Sales_Report.xlsx"
}
Custom Properties¶
This entity supports custom properties through the extension field. Common custom properties include:
- Document Classification: Sensitivity level, compliance category
- Review Cycle: Review frequency and next review date
- Approval Status: Document approval workflow status
- Related Projects: Associated projects or initiatives
See Custom Properties for details on defining and using custom properties.
Related Entities¶
- Drive Service: Service hosting the file (Google Drive, OneDrive, SharePoint)
- Directory: Parent directory containing the file
- User: File owners, modifiers, and followers
- Team: Teams owning or collaborating on files
- Tag: File classifications and labels
- Domain: Domain organization
- Data Product: Data products including the file
- Glossary Term: Business terms associated with the file