Design Instagram
Photo Storage, CDN Distribution, Feed Ranking, Stories Architecture, and Image Processing Pipeline
Instagram serves 2B+ monthly active users uploading 100M+ photos per day. The core challenges: building an image processing pipeline that resizes and optimizes every upload into multiple formats, distributing billions of images globally via a CDN with sub-100ms latency, designing a feed ranking algorithm that balances recency, engagement, and relationship signals, implementing ephemeral Stories with 24-hour TTL, and managing a social graph where celebrity accounts have 500M+ followers. At scale, that means petabytes of new storage per year and millions of read QPS served from edge caches.
Image Upload Pipeline Simulator
Every photo upload triggers a multi-step pipeline: the client uploads the raw image, the API gateway validates and routes it, the image processor generates multiple resolutions, objects are stored in S3, CDN edges are warmed, and metadata is persisted to the database.
CDN Architecture Visualizer
Images are served from the nearest CDN Point of Presence (PoP). A cache hit returns the image in single-digit milliseconds; a miss requires fetching from the origin, adding hundreds of milliseconds. Simulate requests from different regions to see the difference.
Feed Ranking Algorithm
Instagram's feed is ranked, not chronological. Each post's score is computed as: score = w1*recency + w2*engagement + w3*relationship + w4*contentType. Adjust the weights to see how the feed reorders in real time.
Capacity Estimation Calculator
Back-of-the-envelope math for Instagram at scale. Adjust the parameters and see how storage, bandwidth, and QPS requirements change.
Stories System Architecture
Stories are ephemeral content with a 24-hour TTL. Design decisions around storage, delivery, and expiration significantly impact system complexity and cost.
Storage: Separate vs Shared
- + Optimized TTL cleanup
- + Independent scaling
- + Different replication policy
- - Data duplication for reposts
- + Unified media pipeline
- + Simpler ops
- - TTL logic mixed in
- - Over-provisioned durability
Delivery: Push vs Pull
- + Instant story ring updates
- + Pre-computed story feeds
- - Write amplification
- - Wasted for inactive users
- + No wasted writes
- + Simpler pipeline
- - Higher read latency
- - Thundering herd on open
24-Hour TTL: Expiration Strategies
- Check TTL on read, skip expired
- Background cleanup job hourly
- Simple, eventually consistent
- Redis EXPIRE for feed entries
- Change Data Capture deletes S3 objects
- Precise, cost-efficient storage
High-Level Architecture
The system decomposes into independent services connected through a message queue, with shared infrastructure for storage, caching, and content delivery.
Key Design Decisions
Image Storage Strategy
Store images in object storage (S3) with a CDN layer, never in the database. The DB only stores metadata: photo_id, user_id, S3 URL, dimensions, and timestamps. Object storage provides 11 nines durability, infinite scalability, and costs ~$0.023/GB/month vs $0.10+/GB for database storage. The CDN serves 95%+ of reads, reducing S3 egress costs and latency.
Feed Generation Strategy
Fan-out on write for normal users: when they post, push the post_id into every follower's Redis feed. Fan-out on read for celebrities (>500K followers): their posts are merged into feeds at read time to avoid millions of writes per post. The threshold is tuned based on write capacity -- Instagram uses ~10K followers as the cutoff. This hybrid approach handles both the celebrity problem and the common case efficiently.
Sharding Strategy
Shard by user_id using consistent hashing. All photos, followers, and feed data for a user live on the same shard, enabling single-shard queries for profile views and feed generation. Cross-shard queries (search, explore) use a separate index. Photo IDs embed the shard key: shard_id (16 bits) + timestamp (32 bits) + sequence (16 bits). This keeps data locality high and avoids scatter-gather for the hot path.
Consistency Model
Eventual consistency is acceptable for Instagram. A post appearing 2-3 seconds late in a follower's feed is fine. Like counts can lag. The critical path (upload confirmation, follow/unfollow) is strongly consistent via synchronous writes. Feed and story delivery use async fan-out through Kafka. Read-your-own-writes consistency is maintained by reading from the leader for the posting user's own profile view.