Design a Rate Limiter

Token Bucket, Sliding Window, Distributed Redis Counters, and HTTP 429 Handling

A rate limiter controls the rate of traffic a client can send to an API. It prevents abuse, protects resources, and ensures fair usage. Core algorithms: token bucket (allows bursts, used by AWS/Stripe), sliding window log (precise but memory-heavy), fixed window counter (simple but has boundary spikes), and sliding window counter (balanced trade-off). At scale, you need distributed rate limiting with Redis for sub-ms decisions across multiple API servers.

Token Bucket Algorithm

Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. Adjust fill rate and bucket size to see the effect.

Bucket
10
Capacity: 10
Request Log
Tokens10
Accepted0
Rejected0
Accept Rateβ€”

Sliding Window Log

Tracks the exact timestamp of each request. The window slides with real time β€” requests outside the window expire. Limit: 5 requests per window.

In Window0
Accepted0
Rejected0
Window Usage0%

Fixed Window vs Sliding Window

The fixed window boundary problem: a burst at the end of one window + start of the next can exceed the intended rate by 2Γ—. Sliding window eliminates this.

Fixed Window Counter

Limit: 5/window Peak throughput: β€”

Sliding Window Counter

Limit: 5/window Peak throughput: β€”

Architecture

Client
β†’
API Gateway
β†’
Rate Limiter
↓
Redis Cluster
Counters + TTL
↓
Rules Engine
Rate configs
β†’
App Servers
If allowed
Response Headers
X-RateLimit-Remaining: 4
X-RateLimit-Limit: 100
X-RateLimit-Reset: 1672531200
When Exceeded
HTTP 429 Too Many Requests
Retry-After: 30

Algorithm Comparison

Token Bucket vs Leaking Bucket

Token Bucket
  • Allows controlled bursts
  • Two params: fill rate + size
  • Used by AWS, Stripe, GitHub
  • Memory-efficient
vs
Leaking Bucket
  • Strict constant output rate
  • FIFO queue-based
  • Smooths traffic completely
  • Can delay recent requests

Fixed Window vs Sliding Window

Fixed Window Counter
  • Simple: one counter per window
  • Low memory (1 counter)
  • Boundary burst problem
  • Can allow 2Γ— rate at edges
vs
Sliding Window Counter
  • Weighted avg of current + prev
  • Smooths boundary spikes
  • Only 2 counters needed
  • Best accuracy/memory ratio

Distributed Rate Limiting with Redis

Use Redis INCR + EXPIRE in a Lua script for atomic counter operations. The Lua script: INCR key, check against limit, EXPIRE key ttl if new. This avoids race conditions from separate INCR and EXPIRE commands. For sliding window, use sorted sets with ZADD (timestamp as score) and ZRANGEBYSCORE to count entries in the window.

Race Conditions & Solutions

Problem: Two servers read counter=4 (limit=5), both increment β†’ counter=6 (exceeds limit). Solution 1: Redis Lua scripts for atomic read-check-increment. Solution 2: Redis MULTI/EXEC transactions. Solution 3: Per-server local rate limiters with a global sync (eventually consistent, but fast).

Capacity Estimation

Estimate the Redis memory needed for your rate limiting infrastructure.

Total countersβ€”
Redis memoryβ€”
Ops/sec (peak)β€”
Redis nodesβ€”

A must-know system design topic β€” understand the trade-offs between accuracy, memory, and distributed coordination.