Data Lake

Raw, unstructured data storage.

Conteúdo pendente de tradução. Exibindo a versão em inglês.

A Data Lake is designed to ingest data as-is, without the need for immediate structure or predefined schemas. This 'Schema-on-Read' approach allows for maximum flexibility, enabling data scientists and engineers to store raw logs, images, sensor data, and social media feeds in a single location. It serves as the primary source for Big Data processing and machine learning training, where the raw context of data is often more valuable than a cleaned, structured subset.

        graph LR
  Center["Data Lake"]:::main
  Pre_big_data["big-data"]:::pre --> Center
  click Pre_big_data "/terms/big-data"
  Pre_distributed_computing["distributed-computing"]:::pre --> Center
  click Pre_distributed_computing "/terms/distributed-computing"
  Rel_data_warehouse["data-warehouse"]:::related -.-> Center
  click Rel_data_warehouse "/terms/data-warehouse"
  Rel_machine_learning["machine-learning"]:::related -.-> Center
  click Rel_machine_learning "/terms/machine-learning"
  Rel_oracle_network["oracle-network"]:::related -.-> Center
  click Rel_oracle_network "/terms/oracle-network"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧒 Explique como se eu tivesse 5 anos

🌊 A massive digital storage space where you keep all your company's information in its raw form until you're ready to analyze it.

🤓 Expert Deep Dive

## Avoiding the Data Swamp
Without proper Metadata Management, a Data Lake quickly becomes a 'Data Swamp.' To prevent this, data engineering teams must implement:
1. Data Discovery: Automated crawlers that scan and tag new files.
2. Access Control: Strict IAM roles at the bucket/prefix level.
3. Quality Checks: Automated validation as data moves from Bronze (raw) to Silver (cleaned) zones.

🔗 Termos relacionados

Pré-requisitos:

📚 Fontes