Design a Rate Limiter

Token Bucket, Sliding Window, Distributed Redis Counters, and HTTP 429 Handling

A rate limiter controls the rate of traffic a client can send to an API. It prevents abuse, protects resources, and ensures fair usage. Core algorithms: token bucket (allows bursts, used by AWS/Stripe), sliding window log (precise but memory-heavy), fixed window counter (simple but has boundary spikes), and sliding window counter (balanced trade-off). At scale, you need distributed rate limiting with Redis for sub-ms decisions across multiple API servers.

Token Bucket Algorithm

Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. Adjust fill rate and bucket size to see the effect.

Fill rate: 5 tokens/sec

Bucket size: 10 tokens

Bucket

Capacity: 10

Request Log

Tokens10

Accepted0

Rejected0

Accept Rate—

Sliding Window Log

Tracks the exact timestamp of each request. The window slides with real time — requests outside the window expire. Limit: 5 requests per window.

Window size (seconds): 10

Request limit per window: 5

In Window0

Accepted0

Rejected0

Window Usage0%

Fixed Window vs Sliding Window

The fixed window boundary problem: a burst at the end of one window + start of the next can exceed the intended rate by 2×. Sliding window eliminates this.

Fixed Window Counter

Limit: 5/window Peak throughput: —

Sliding Window Counter

Limit: 5/window Peak throughput: —

Architecture

Client

→

API Gateway

→

Rate Limiter

↓

Redis Cluster

Counters + TTL

↓

Rules Engine

Rate configs

→

App Servers

If allowed

Response Headers

X-RateLimit-Remaining: 4
X-RateLimit-Limit: 100
X-RateLimit-Reset: 1672531200

When Exceeded

HTTP 429 Too Many Requests
Retry-After: 30

Algorithm Comparison

Token Bucket vs Leaking Bucket Token Bucket Allows controlled bursts
Two params: fill rate + size
Used by AWS, Stripe, GitHub
Memory-efficient
 
vs
 Leaking Bucket Strict constant output rate
FIFO queue-based
Smooths traffic completely
Can delay recent requests
 

Fixed Window vs Sliding Window Fixed Window Counter Simple: one counter per window
Low memory (1 counter)
Boundary burst problem
Can allow 2× rate at edges
 
vs
 Sliding Window Counter Weighted avg of current + prev
Smooths boundary spikes
Only 2 counters needed
Best accuracy/memory ratio
 

Distributed Rate Limiting with Redis

Use Redis INCR + EXPIRE in a Lua script for atomic counter operations. The Lua script: INCR key, check against limit, EXPIRE key ttl if new. This avoids race conditions from separate INCR and EXPIRE commands. For sliding window, use sorted sets with ZADD (timestamp as score) and ZRANGEBYSCORE to count entries in the window.

Race Conditions & Solutions

Problem: Two servers read counter=4 (limit=5), both increment → counter=6 (exceeds limit). Solution 1: Redis Lua scripts for atomic read-check-increment. Solution 2: Redis MULTI/EXEC transactions. Solution 3: Per-server local rate limiters with a global sync (eventually consistent, but fast).

Capacity Estimation

Estimate the Redis memory needed for your rate limiting infrastructure.

Unique users (millions): 10

Rules per user: 5

Algorithm: Sliding Window Counter

Total counters—

Redis memory—

Ops/sec (peak)—

Redis nodes—

Token Bucket Algorithm

Sliding Window Log

Fixed Window vs Sliding Window

Fixed Window Counter

Sliding Window Counter

Architecture

Algorithm Comparison

Token Bucket vs Leaking Bucket

Fixed Window vs Sliding Window

Distributed Rate Limiting with Redis

Race Conditions & Solutions

Capacity Estimation

🔗 Related Topics