๐Ÿ“จ Partitions & Offsets

The Anatomy of a Kafka Topic โ€” and How to Choose Partition Counts

A Kafka topic is split into partitions โ€” ordered, append-only logs. Each message gets an offset (sequence number) within its partition. Producers choose partitions by key hash (or round-robin). Consumer groups divide partitions among members โ€” each partition is read by exactly one consumer.

๐Ÿ“ค Producer: How Messages Land in Partitions

Messages0
Keys Seen0
Hottest Partitionโ€”
Skew0%

๐Ÿ‘ฅ Consumer Groups: Partition Assignment

Each partition is assigned to exactly one consumer. More consumers than partitions = idle consumers.

๐Ÿ“ How Many Partitions?

Rule of thumb: target throughput รท per-partition throughput. If you need 100MB/s and each partition handles ~10MB/s, use 10 partitions. More partitions = more parallelism, but also more overhead (file handles, replication traffic, leader elections).

Partitions โ‰ฅ max(target_throughput / partition_throughput, consumer_count)

๐Ÿ”‘ Key-Based Ordering

Messages with the same key always go to the same partition โ†’ guaranteed ordering per key. This is how you get ordered event streams per user/entity. But beware: hot keys (one key with 80% of traffic) create partition hotspots.

๐Ÿ“Š Offset Tracking

Each consumer tracks its position per partition via committed offsets (stored in __consumer_offsets topic). On restart, consumers resume from their last commit. Auto-commit (default) vs manual commit trades convenience for exactly-once guarantees.

โš–๏ธ Rebalancing

When consumers join/leave, Kafka rebalances โ€” reassigning partitions. During rebalance, consumption pauses (stop-the-world). Use CooperativeStickyAssignor to minimize partition movement and reduce downtime.

๐Ÿ› ๏ธ Operational Best Practices

๐Ÿ“ˆ Monitor Consumer Lag

kafka-consumer-groups.sh --describe --group mygroup

If lag grows steadily, you need more consumers or faster processing. Alert when lag > N minutes.

๐Ÿ”ข Never Decrease Partitions

# You can only ADD partitions, not remove
kafka-topics.sh --alter --partitions 12

Adding partitions changes key routing โ€” messages with the same key may land in different partitions. Plan ahead.

โฐ Retention & Compaction

retention.ms=604800000 # 7 days
cleanup.policy=compact # or delete

Delete: remove old messages by time/size. Compact: keep latest per key. Use compact for changelog topics.

๐Ÿ”ง Producer Tuning

batch.size=65536
linger.ms=5
acks=all

Batch + linger for throughput. acks=all for durability. acks=1 for latency. Never acks=0 in production.

๐Ÿฅ Under-Replicated Partitions

kafka-topics.sh --describe --under-replicated

Non-zero URP = data at risk. Check broker health, disk I/O, network. This is your #1 Kafka alert.

๐ŸŽฏ Rack-Aware Replication

broker.rack=us-east-1a

Ensure replicas spread across racks/AZs. Losing one rack shouldn't lose any partition's quorum.