Design a URL Shortener
Base62 Encoding, Hash Collision Resolution, Redirect Strategies, and Scalable ID Generation
A URL shortener (e.g., bit.ly, tinyurl.com) maps long URLs to short aliases like sho.rt/a3Xb9k. The core challenges: generating unique short keys at scale, handling hash collisions, choosing between 301 (permanent) and 302 (temporary) redirects for analytics, and building a system that handles billions of reads with sub-10ms latency. A 7-character Base62 key gives 62β· β 3.5 trillion possible URLs.
Base62 Encoding Visualizer
Enter a numeric ID (from an auto-increment counter or Snowflake ID) to see it encoded to a Base62 short key. Each digit maps to 0-9a-zA-Z.
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ Capacity Estimation
Estimate storage, bandwidth, and cache requirements for your URL shortening service.
Architecture
Hash Collision Resolution
Type a URL to see how hashing + collision detection works. If the short key already exists in the "database", the system appends a counter and rehashes until a unique key is found.
Redirect Flow: 301 vs 302
301 Permanent vs 302 Temporary Redirect
- Browser caches the redirect
- Subsequent requests skip your server
- Lower server load
- Lose analytics on repeat visits
- Every click hits your server
- Full click analytics & tracking
- Higher server load
- Can change destination later
API Design
POST /api/shorten β {"url": "https://..."} β {"short_url": "sho.rt/a3Xb9k"}
GET /:shortKey β 301/302 redirect to original URL
GET /api/stats/:shortKey β click count, referrers, geo, timestamps
Database Schema
urls table: short_key (PK, varchar(7)), original_url (text), created_at (timestamp), user_id (bigint), expires_at (timestamp nullable).
Index on short_key for O(1) lookups. Shard by short_key hash for horizontal scaling.
Key Design Decisions
ID Generation: Counter vs Hash
- Zero collisions by construction
- Short, predictable keys
- Needs distributed counter (Zookeeper)
- Sequentially guessable
- No coordination needed
- Same URL β same hash
- Collision risk β need retry logic
- Slightly longer keys
Rate Limiting
Protect the write endpoint: limit to 100 URLs/min per user via token bucket. Use API keys for authenticated users and IP-based limiting for anonymous traffic. Return 429 Too Many Requests with Retry-After header.
Analytics Pipeline
On each redirect, emit a click event to Kafka asynchronously. A Flink/Spark consumer aggregates clicks by geo, referrer, device, and time bucket. Store aggregates in a time-series DB or ClickHouse for the analytics dashboard. Never block the redirect for analytics.