Overview

What is Apache Kafka ?

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.
It was originally developed at LinkedIn and later open-sourced, now maintained by the Apache Software Foundation.

Kafka is designed to handle high throughput, low latency, fault-tolerant, and scalable message streaming.

Key Concepts

Producer → Sends data (messages/events) into Kafka.
Consumer → Reads data from Kafka.
Topic → A category or feed name to which records are published.
Partition → Each topic is split into partitions for parallelism and scalability.
Broker → A Kafka server that stores data and serves client requests.
Cluster → A group of Kafka brokers working together.

Core Features

High throughput & scalability – Can handle millions of messages per second.
Durability – Messages are written to disk and replicated across brokers.
Fault tolerance – Survives server failures without data loss.
Integration – Works with many data sources (databases, microservices, cloud services).

Common Use Cases

Real-time analytics (e.g., monitoring financial transactions).
Event-driven architectures (microservices communication).
Log aggregation (centralized logs from many systems).
Data pipelines (stream data from source → Kafka → data warehouse).
IoT streaming (sensor data ingestion).

What is Apache Kafka ?​

Key Concepts​

Core Features​

Common Use Cases​

What is Apache Kafka ?

Key Concepts

Core Features

Common Use Cases