📬 Design a Message Queue

Kafka, RabbitMQ & SQS Architecture — Producer-Consumer Patterns, Partition Routing & Capacity Planning

Message queues enable asynchronous communication between services, decoupling producers from consumers. They're the backbone of event-driven microservices, handling everything from order processing to real-time analytics pipelines. The three dominant implementations: Apache Kafka (log-based, high-throughput streaming), RabbitMQ (smart broker, flexible routing), and Amazon SQS (fully managed, serverless simplicity).

🏗️ Architecture Comparison

Click each queue to understand its core design philosophy.

📮

Apache Kafka

Log-based pub/sub. Messages are appended to an immutable ordered log. Partitions enable parallel consumers.

Append-only log Partition replication Consumer offset Retention-based
Ordering: Per partition
Delivery: At-least-once / Exactly-once
Throughput: Millions/sec
Use when: Event streaming, log aggregation, CDC
🐰

RabbitMQ

Smart broker with routing. Exchanges route messages to queues based on binding rules. Flexible topologies.

Exchange + Binding Queue-based Smart broker Acknowledgment
Ordering: Per queue (FIFO)
Delivery: At-least-once / At-most-once
Throughput: Tens of thousands/sec
Use when: Task queues, complex routing, legacy systems
☁️

Amazon SQS

Fully managed serverless. No brokers to manage, no configuration to tune. Auto-scales to any load.

Fully managed Auto-scaling No ordering guarantee Pay-per-use
Ordering: Standard: none / FIFO: per message group
Delivery: At-least-once (sometimes duplicate)
Throughput: Nearly unlimited (auto-scaling)
Use when: AWS workloads, no ops, rapid prototyping

🔀 Kafka Partition Routing

Kafka routes messages to partitions using a key hash. Same key → same partition (preserving order for that key). Keyless messages are round-robin.

Producer
user_123 {"event": "purchase", "amount": 99}
Partition 0
Partition 1
Partition 2
Consumer Group
C0
C1
Kafka partitioner logic (Java)
// Same key → same partition (per-topic ordering for that key)
int partition = partitioner.choosePartition(
    record.topic(),
    record.key() != null 
        ? Math.abs(Utils.murmur2(record.key().getBytes())) % numPartitions
        : nextPartition(topic, partitionCounter++)  // round-robin
);

// Consumer reads from assigned partitions
consumer.assign(List.of(0, 1));  // C0 reads P0, P1
// or
consumer.subscribe(List.of("topic"));  // rebalance assigns partitions

⚖️ Consumer Group Rebalancing

As consumers join or leave, Kafka triggers a rebalance to redistribute partitions. Watch how partitions move between consumers.

Try adding or removing consumers:
Consumers: 3 Partitions: 6 Rebalances: 0
Topic: orders
Ready. Add consumers and trigger a rebalance.

🔄 SQS Message Lifecycle

SQS handles messages differently from Kafka. Each message is processed by exactly one consumer, then deleted after acknowledgment.

Producers
SQS Queue
Visibility timeout: 30s
Consumer
DLQ (failed after max receive)
SQS receive-delete pattern
// Poll for messages (long polling reduces empty responses)
const result = await sqs.receiveMessage({
  QueueUrl: QUEUE_URL,
  MaxNumberOfMessages: 10,
  WaitTimeSeconds: 20,  // long polling
  VisibilityTimeout: 30,
}).promise();

// Process each message
for (const msg of result.Messages) {
  await process(msg.Body);
  
  // Delete after successful processing
  await sqs.deleteMessage({
    QueueUrl: QUEUE_URL,
    ReceiptHandle: msg.ReceiptHandle
  }).promise();
}

📐 Capacity Planning Calculator

Estimate required infrastructure based on your throughput requirements.

Throughput
MB/s ingestion
Storage Needed
GB raw storage
Min Partitions
for parallelism
Broker Disk I/O
MB/s write per broker
Consumer Lag
max lag (messages)
Multi-AZ Brokers
with RF=3

🎯 When to Use What

Scenario
Kafka
RabbitMQ
SQS
High-throughput event streaming
✓ Kafka
Complex routing (topic/exchange)
✓ RabbitMQ
AWS-native, no ops
✓ SQS
Exactly-once processing
✓ Kafka + transactions
Task queues / job processing
Possible
✓ RabbitMQ
✓ SQS
CDC from databases
✓ Debezium + Kafka
Millisecond latency
✓ Kafka
~1-5ms
~100ms+