⚖️ Design a Load Balancer
Compare algorithms side-by-side — round-robin, weighted, least connections, consistent hashing
A load balancer distributes incoming requests across multiple backend servers to maximize throughput, minimize latency, and ensure no single server is overwhelmed. Different algorithms optimize for different goals: even distribution, server capacity awareness, session affinity, and minimal disruption on scaling. Try them below.
⚙️ Configuration
📊 Traffic Distribution
📈 Server Metrics
📋 Algorithm Comparison
🧠 Deep Dive
How Load Balancers Work
A load balancer sits between clients and a pool of backend servers. It accepts incoming connections and forwards each request to one of the available servers based on the chosen algorithm. Modern load balancers operate at Layer 4 (TCP/UDP — fast, connection-level) or Layer 7 (HTTP — content-aware, can route based on URL, headers, cookies). Hardware load balancers (F5, Citrix) are being replaced by software solutions like NGINX, HAProxy, and cloud-native options (AWS ALB/NLB, GCP Load Balancing).
Round Robin
The simplest algorithm: requests are distributed sequentially across servers in a circular order. Server 1 → Server 2 → Server 3 → Server 1 → ... This works well when all servers are identical and requests are roughly uniform in cost. The downside is it ignores server load — a slow server accumulates a backlog while fast servers sit idle. DNS round-robin is a common variant where the DNS server returns IPs in rotating order, but it lacks health checking and has TTL caching issues.
Weighted Round Robin
An extension where each server gets a weight proportional to its capacity. A server with weight 3
receives 3x more requests than one with weight 1. This handles heterogeneous fleets — a 16-core
machine gets more traffic than a 4-core one. Weights can be static (configured) or dynamic (adjusted
based on health checks). NGINX uses this with weight directives. The smooth weighted
round-robin variant (used by NGINX) prevents burst assignment to high-weight servers.
Least Connections
Routes each request to the server with the fewest active connections. This naturally adapts to varying
request durations — expensive queries don't cause a server to get overloaded. Variants include
weighted least connections (accounts for server capacity) and least response time
(factors in latency). The load balancer must track active connections per server, adding state.
HAProxy's leastconn mode is widely used for database connection pooling and long-lived connections.
Consistent Hashing
Maps both servers and requests onto a hash ring. Each request is routed to the nearest server clockwise on the ring. The key insight: when a server is added or removed, only ~1/n of requests are remapped (vs. all requests with modular hashing). Virtual nodes (multiple points per server on the ring) improve uniformity. This is essential for distributed caches (Memcached, Redis Cluster), CDNs, and sharded databases where moving data is expensive. Amazon DynamoDB and Apache Cassandra use consistent hashing for partition assignment.
Health Checks & Failover
Production load balancers continuously probe backends with health checks (HTTP GET /health, TCP connect, or gRPC health protocol). Unhealthy servers are removed from the pool. Patterns include active health checks (periodic probes), passive health checks (tracking error rates), and circuit breaking (removing a server after N consecutive failures, re-adding after a cooldown). Connection draining ensures in-flight requests complete before a server is removed.
💡 Interview Tips
L4 vs L7
L4 load balancers (TCP) are faster but can't inspect content. L7 (HTTP) can route based on URL path, headers, cookies — enabling A/B testing, canary deployments, and sticky sessions.
Global server load balancing
For multi-region, use DNS-based GSLB (GeoDNS) to route users to the nearest data center. Combine with local load balancers within each DC. Anycast IP is another approach (Cloudflare, Google).
Sticky sessions
When servers hold session state, use cookie-based or IP-hash affinity. Better approach: externalize state to Redis/Memcached so any server can handle any request. Stateless > sticky.
Auto-scaling integration
Load balancers work with auto-scaling groups. Key: pre-warming (register new instances gradually), connection draining (remove gracefully), and health check grace periods (don't kill slow starters).