Design a URL Shortener

Base62 Encoding, Hash Collision Resolution, Redirect Strategies, and Scalable ID Generation

A URL shortener (e.g., bit.ly, tinyurl.com) maps long URLs to short aliases like sho.rt/a3Xb9k. The core challenges: generating unique short keys at scale, handling hash collisions, choosing between 301 (permanent) and 302 (temporary) redirects for analytics, and building a system that handles billions of reads with sub-10ms latency. A 7-character Base62 key gives 62⁷ β‰ˆ 3.5 trillion possible URLs.

Base62 Encoding Visualizer

Enter a numeric ID (from an auto-increment counter or Snowflake ID) to see it encoded to a Base62 short key. Each digit maps to 0-9a-zA-Z.

Short Key β€”
Key Length β€”
Full Short URL β€”
Base62 charset: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

Capacity Estimation

Estimate storage, bandwidth, and cache requirements for your URL shortening service.

Writes/secβ€”
Reads/secβ€”
Total URLs (5yr)β€”
Storage neededβ€”
Cache (20% hot)β€”
Bandwidth (read)β€”
Key space usedβ€”
Key length neededβ€”

Architecture

Client
β†’
API Gateway
β†’
App Servers
↓
Cache (Redis)
Hot URLs
↓
Database
URL Mappings
↓
Zookeeper
ID Range Alloc
↓
Analytics (Kafka)
↓
Data Warehouse

Hash Collision Resolution

Type a URL to see how hashing + collision detection works. If the short key already exists in the "database", the system appends a counter and rehashes until a unique key is found.

Click "Shorten" to start β€” try the same URL twice to see collision handling
Simulated Database
Empty β€” no URLs shortened yet

Redirect Flow: 301 vs 302

301 Permanent vs 302 Temporary Redirect

301 Permanent
  • Browser caches the redirect
  • Subsequent requests skip your server
  • Lower server load
  • Lose analytics on repeat visits
vs
302 Temporary
  • Every click hits your server
  • Full click analytics & tracking
  • Higher server load
  • Can change destination later

API Design

POST /api/shorten β€” {"url": "https://..."} β†’ {"short_url": "sho.rt/a3Xb9k"}
GET /:shortKey β†’ 301/302 redirect to original URL
GET /api/stats/:shortKey β†’ click count, referrers, geo, timestamps

Database Schema

urls table: short_key (PK, varchar(7)), original_url (text), created_at (timestamp), user_id (bigint), expires_at (timestamp nullable).
Index on short_key for O(1) lookups. Shard by short_key hash for horizontal scaling.

Key Design Decisions

ID Generation: Counter vs Hash

Counter + Base62
  • Zero collisions by construction
  • Short, predictable keys
  • Needs distributed counter (Zookeeper)
  • Sequentially guessable
vs
Hash (MD5/SHA256) + Truncate
  • No coordination needed
  • Same URL β†’ same hash
  • Collision risk β€” need retry logic
  • Slightly longer keys

Rate Limiting

Protect the write endpoint: limit to 100 URLs/min per user via token bucket. Use API keys for authenticated users and IP-based limiting for anonymous traffic. Return 429 Too Many Requests with Retry-After header.

Analytics Pipeline

On each redirect, emit a click event to Kafka asynchronously. A Flink/Spark consumer aggregates clicks by geo, referrer, device, and time bucket. Store aggregates in a time-series DB or ClickHouse for the analytics dashboard. Never block the redirect for analytics.

A classic system design question β€” master encoding, scaling reads, and trade-off analysis.