Overview
What is Apache Kafka ?​
Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.
It was originally developed at LinkedIn and later open-sourced, now maintained by the Apache Software Foundation.
Kafka is designed to handle high throughput, low latency, fault-tolerant, and scalable message streaming.
Key Concepts​
- Producer → Sends data (messages/events) into Kafka.
- Consumer → Reads data from Kafka.
- Topic → A category or feed name to which records are published.
- Partition → Each topic is split into partitions for parallelism and scalability.
- Broker → A Kafka server that stores data and serves client requests.
- Cluster → A group of Kafka brokers working together.
Core Features​
- High throughput & scalability – Can handle millions of messages per second.
- Durability – Messages are written to disk and replicated across brokers.
- Fault tolerance – Survives server failures without data loss.
- Integration – Works with many data sources (databases, microservices, cloud services).
Common Use Cases​
- Real-time analytics (e.g., monitoring financial transactions).
- Event-driven architectures (microservices communication).
- Log aggregation (centralized logs from many systems).
- Data pipelines (stream data from source → Kafka → data warehouse).
- IoT streaming (sensor data ingestion).