📨 Partitions & Offsets

The Anatomy of a Kafka Topic — and How to Choose Partition Counts

A Kafka topic is split into partitions — ordered, append-only logs. Each message gets an offset (sequence number) within its partition. Producers choose partitions by key hash (or round-robin). Consumer groups divide partitions among members — each partition is read by exactly one consumer.

📤 Producer: How Messages Land in Partitions

Partitions: 4

Routing:

Messages0

Keys Seen0

Hottest Partition—

Skew0%

👥 Consumer Groups: Partition Assignment

Each partition is assigned to exactly one consumer. More consumers than partitions = idle consumers.

Consumers: 2

📏 How Many Partitions?

Rule of thumb: target throughput ÷ per-partition throughput. If you need 100MB/s and each partition handles ~10MB/s, use 10 partitions. More partitions = more parallelism, but also more overhead (file handles, replication traffic, leader elections).

Partitions ≥ max(target_throughput / partition_throughput, consumer_count)

🔑 Key-Based Ordering

Messages with the same key always go to the same partition → guaranteed ordering per key. This is how you get ordered event streams per user/entity. But beware: hot keys (one key with 80% of traffic) create partition hotspots.

📊 Offset Tracking

Each consumer tracks its position per partition via committed offsets (stored in __consumer_offsets topic). On restart, consumers resume from their last commit. Auto-commit (default) vs manual commit trades convenience for exactly-once guarantees.

⚖️ Rebalancing

When consumers join/leave, Kafka rebalances — reassigning partitions. During rebalance, consumption pauses (stop-the-world). Use CooperativeStickyAssignor to minimize partition movement and reduce downtime.

🛠️ Operational Best Practices

📈 Monitor Consumer Lag

kafka-consumer-groups.sh --describe --group mygroup

If lag grows steadily, you need more consumers or faster processing. Alert when lag > N minutes.

🔢 Never Decrease Partitions

# You can only ADD partitions, not remove
kafka-topics.sh --alter --partitions 12

Adding partitions changes key routing — messages with the same key may land in different partitions. Plan ahead.

⏰ Retention & Compaction

retention.ms=604800000  # 7 days
cleanup.policy=compact  # or delete

Delete: remove old messages by time/size. Compact: keep latest per key. Use compact for changelog topics.

🔧 Producer Tuning

batch.size=65536
linger.ms=5
acks=all

Batch + linger for throughput. acks=all for durability. acks=1 for latency. Never acks=0 in production.

🏥 Under-Replicated Partitions

kafka-topics.sh --describe --under-replicated

Non-zero URP = data at risk. Check broker health, disk I/O, network. This is your #1 Kafka alert.

🎯 Rack-Aware Replication

broker.rack=us-east-1a

Ensure replicas spread across racks/AZs. Losing one rack shouldn't lose any partition's quorum.