Design Twitter / X
Tweet Fanout Strategies, Timeline Generation, Search Indexing, Capacity Estimation, and Hybrid Push-Pull Architecture
Twitter/X serves 500M+ daily active users generating 500M+ tweets per day. The core challenges: choosing between push (fan-out on write) and pull (fan-out on read) for timeline delivery, handling the celebrity problem where a single account may have 100M+ followers, building a real-time search index over the full tweet corpus, designing a timeline cache in Redis that serves home timelines in under 50ms, and scaling a system where the read-to-write ratio exceeds 600:1. The hybrid fanout approach—push for normal users, pull for celebrities—is the defining architectural insight.
Tweet Fanout Simulator
When a user tweets, how does it reach followers? In push (fan-out on write), the tweet is written to every follower's timeline cache. In pull (fan-out on read), followers fetch tweets on demand. Enter a follower count to compare the two approaches and see why Twitter uses a hybrid model.
Timeline Generation Pipeline
Trace how a tweet flows through the system from creation to appearing in followers' timelines. Click to simulate a tweet from a normal user (push path) or a celebrity (hybrid path).
Capacity Estimation Calculator
Back-of-the-envelope math for Twitter at scale. Adjust the parameters and see how QPS, storage, cache, and bandwidth requirements change.
Search Indexing Simulator
Type a tweet to see how it gets tokenized and added to an inverted index. Each word maps to a list of tweet IDs containing it. Add multiple tweets to watch the index grow.
Architecture
Key Design Decisions
Push vs Pull vs Hybrid Fanout
- + Timeline reads are O(1) from cache
- + Sub-millisecond read latency
- - Celebrity tweets need N million writes
- - Wasted writes for inactive followers
- + Tweet write is O(1)
- + No wasted work for inactive users
- - Slow: must query all followees
- - N+1 query problem at read time
Tweet Storage: SQL vs NoSQL
Tweet data (text, user_id, timestamp) fits well in a relational DB partitioned by tweet_id. Twitter historically used MySQL with custom sharding (Gizzard). The tweets table is append-heavy and rarely updated. Shard by user_id for user-timeline queries, or by tweet_id for random access. Use a snowflake ID for time-sortable, globally unique IDs.
Search Architecture
Twitter's search is an inverted index built on top of Earlybird (custom Lucene). Tweets are tokenized, stemmed, and indexed in near real-time. The index is partitioned by time (recent tweets on faster hardware) and by hash for horizontal scaling. Queries fan out to all partitions and results are merged by relevance and recency.