Data Lake
Raw, unstructured data storage.
A Data Lake is designed to ingest data as-is, without the need for immediate structure or predefined schemas. This 'Schema-on-Read' approach allows for maximum flexibility, enabling data scientists and engineers to store raw logs, images, sensor data, and social media feeds in a single location. It serves as the primary source for Big Data processing and machine learning training, where the raw context of data is often more valuable than a cleaned, structured subset.
graph LR
Center["Data Lake"]:::main
Pre_big_data["big-data"]:::pre --> Center
click Pre_big_data "/terms/big-data"
Pre_distributed_computing["distributed-computing"]:::pre --> Center
click Pre_distributed_computing "/terms/distributed-computing"
Rel_data_warehouse["data-warehouse"]:::related -.-> Center
click Rel_data_warehouse "/terms/data-warehouse"
Rel_machine_learning["machine-learning"]:::related -.-> Center
click Rel_machine_learning "/terms/machine-learning"
Rel_oracle_network["oracle-network"]:::related -.-> Center
click Rel_oracle_network "/terms/oracle-network"
classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
linkStyle default stroke:#4b5563,stroke-width:2px;
🧒 5살도 이해할 수 있게 설명
🌊 A massive digital storage space where you keep all your company's information in its raw form until you're ready to analyze it.
🤓 Expert Deep Dive
## Avoiding the Data Swamp
Without proper Metadata Management, a Data Lake quickly becomes a 'Data Swamp.' To prevent this, data engineering teams must implement:
1. Data Discovery: Automated crawlers that scan and tag new files.
2. Access Control: Strict IAM roles at the bucket/prefix level.
3. Quality Checks: Automated validation as data moves from Bronze (raw) to Silver (cleaned) zones.