⚖️ Design a Load Balancer

Compare algorithms side-by-side — round-robin, weighted, least connections, consistent hashing

A load balancer distributes incoming requests across multiple backend servers to maximize throughput, minimize latency, and ensure no single server is overwhelmed. Different algorithms optimize for different goals: even distribution, server capacity awareness, session affinity, and minimal disruption on scaling. Try them below.

⚙️ Configuration

Number of Servers: 4

Request Rate: 5 req/sec

Algorithm:

📊 Traffic Distribution

☁️ Clients

⚖️ Load Balancer

Round Robin

📈 Server Metrics

Total Requests0

Avg Latency0ms

Std Dev (balance)0.0

🔗 Consistent Hash Ring

📋 Algorithm Comparison

Round Robin

O(1)

❌ None

✅ None

Homogeneous servers, stateless

Weighted Round Robin

O(1)

✅ Static capacity

❌ None

✅ None

Heterogeneous server capacity

Least Connections

O(n) or O(log n)

✅ Real-time load

❌ None

✅ Adaptive

Variable request duration

Consistent Hashing

O(log n)

⚠️ Indirect

✅ Natural

✅ Minimal remap

Caches, stateful services

🧠 Deep Dive

How Load Balancers Work

A load balancer sits between clients and a pool of backend servers. It accepts incoming connections and forwards each request to one of the available servers based on the chosen algorithm. Modern load balancers operate at Layer 4 (TCP/UDP — fast, connection-level) or Layer 7 (HTTP — content-aware, can route based on URL, headers, cookies). Hardware load balancers (F5, Citrix) are being replaced by software solutions like NGINX, HAProxy, and cloud-native options (AWS ALB/NLB, GCP Load Balancing).

Round Robin

The simplest algorithm: requests are distributed sequentially across servers in a circular order. Server 1 → Server 2 → Server 3 → Server 1 → ... This works well when all servers are identical and requests are roughly uniform in cost. The downside is it ignores server load — a slow server accumulates a backlog while fast servers sit idle. DNS round-robin is a common variant where the DNS server returns IPs in rotating order, but it lacks health checking and has TTL caching issues.

Weighted Round Robin

An extension where each server gets a weight proportional to its capacity. A server with weight 3 receives 3x more requests than one with weight 1. This handles heterogeneous fleets — a 16-core machine gets more traffic than a 4-core one. Weights can be static (configured) or dynamic (adjusted based on health checks). NGINX uses this with weight directives. The smooth weighted round-robin variant (used by NGINX) prevents burst assignment to high-weight servers.

Least Connections

Routes each request to the server with the fewest active connections. This naturally adapts to varying request durations — expensive queries don't cause a server to get overloaded. Variants include weighted least connections (accounts for server capacity) and least response time (factors in latency). The load balancer must track active connections per server, adding state. HAProxy's leastconn mode is widely used for database connection pooling and long-lived connections.

Consistent Hashing

Maps both servers and requests onto a hash ring. Each request is routed to the nearest server clockwise on the ring. The key insight: when a server is added or removed, only ~1/n of requests are remapped (vs. all requests with modular hashing). Virtual nodes (multiple points per server on the ring) improve uniformity. This is essential for distributed caches (Memcached, Redis Cluster), CDNs, and sharded databases where moving data is expensive. Amazon DynamoDB and Apache Cassandra use consistent hashing for partition assignment.

Health Checks & Failover

Production load balancers continuously probe backends with health checks (HTTP GET /health, TCP connect, or gRPC health protocol). Unhealthy servers are removed from the pool. Patterns include active health checks (periodic probes), passive health checks (tracking error rates), and circuit breaking (removing a server after N consecutive failures, re-adding after a cooldown). Connection draining ensures in-flight requests complete before a server is removed.

💡 Interview Tips

L4 vs L7

L4 load balancers (TCP) are faster but can't inspect content. L7 (HTTP) can route based on URL path, headers, cookies — enabling A/B testing, canary deployments, and sticky sessions.

Global server load balancing

For multi-region, use DNS-based GSLB (GeoDNS) to route users to the nearest data center. Combine with local load balancers within each DC. Anycast IP is another approach (Cloudflare, Google).

Sticky sessions

When servers hold session state, use cookie-based or IP-hash affinity. Better approach: externalize state to Redis/Memcached so any server can handle any request. Stateless > sticky.

Auto-scaling integration

Load balancers work with auto-scaling groups. Key: pre-warming (register new instances gradually), connection draining (remove gracefully), and health check grace periods (don't kill slow starters).