Kafka
A distributed system for publishing, subscribing to, storing, and processing streams of records in real-time.
Apache Kafka is a distributed event streaming platform designed for building real-time data pipelines and streaming applications. It functions as a highly scalable, fault-tolerant, and durable publish-subscribe messaging system. The core architecture revolves around 'topics', which are categories or feeds of records. Producers publish records to topics, and consumers subscribe to topics to read these records. Kafka brokers, forming a cluster, store these records durably and serve them to consumers. Records are organized into partitions within topics, allowing for parallel processing and high throughput. Each partition is an ordered, immutable sequence of records. Replication across brokers ensures fault tolerance; if a broker fails, another replica can take over. Kafka's durability comes from writing records to disk. Key components include Producers, Consumers, Brokers, and ZooKeeper (for cluster coordination, though newer versions are moving towards KRaft). Trade-offs involve the operational complexity of managing a distributed cluster, especially regarding Zookeeper dependencies, and the need for careful capacity planning. However, it offers unparalleled throughput, low latency, and strong durability guarantees, making it ideal for high-volume data streams.
graph LR
Center["Kafka"]:::main
Rel_ipfs["ipfs"]:::related -.-> Center
click Rel_ipfs "/terms/ipfs"
Rel_file_systems["file-systems"]:::related -.-> Center
click Rel_file_systems "/terms/file-systems"
Rel_distributed_ledger_technology_dlt["distributed-ledger-technology-dlt"]:::related -.-> Center
click Rel_distributed_ledger_technology_dlt "/terms/distributed-ledger-technology-dlt"
classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
linkStyle default stroke:#4b5563,stroke-width:2px;
🧠 Knowledge Check
🧒 Explain Like I'm 5
Kafka is like a giant, super-fast, digital post office that never loses a letter. Companies use it to send 'messages' (like a user click or a bank [transaction](/en/terms/transaction)) between different parts of their system instantly, even if some parts are busy or broken.
🤓 Expert Deep Dive
Kafka's architecture is built around the concept of a distributed commit log. Topics are partitioned, and each partition is an ordered sequence of messages. Brokers store these partitions, and replication ensures fault tolerance, with leader-follower dynamics for partition management. Producers write messages to the leader of a partition, and followers asynchronously replicate the data. Consumers maintain their own offset within each partition, allowing them to track their progress independently. This decoupled nature enables high throughput and scalability. Trade-offs include the potential for message reordering within a partition if not handled carefully by consumers, and the complexity of managing consumer group rebalancing during failures or scaling events. ZooKeeper has historically been critical for metadata management, leader election, and broker registration, but the KRaft protocol aims to remove this dependency. Vulnerabilities can include insecure inter-broker communication, insufficient access control leading to data breaches, and potential denial-of-service attacks targeting brokers or ZooKeeper.