Chat System
A modern chat system at WhatsApp / Messenger scale handles 100 B+ messages per day, runs forever-on connections from a billion devices, delivers in < 200 ms across continents, and increasingly does it all end-to-end encrypted so the operator cannot read messages even in principle. The defining design choices are the connection layer (WebSocket fan-out), the message store (Cassandra wide-row), presence (heartbeat scale), and E2EE (Signal protocol).
Architecture
Capacity Estimation
| Metric | Value | Notes |
|---|---|---|
| Daily active users | ~2 B | WhatsApp scale |
| Messages/day | ~100 B | ~50 per DAU |
| Peak send/s | ~5 M | 10× daily mean |
| Concurrent connections | ~1 B | online users |
| Connections per gateway | ~500 K | tuned Linux box |
| Gateway count | ~2 K | 1 B / 500 K |
| Storage growth | ~50 TB/day | 200 B/msg compressed |
WebSocket Fan-out
Long-lived bidirectional connections are the foundation. WebSocket (or MQTT for mobile-battery efficiency) over TLS, kept alive with ~30 s pings. The gateway tier:
- Accepts the connection, authenticates (JWT or signed cookie), registers the mapping
user_id → gateway_idin Redis with a TTL refreshed by heartbeat. - Subscribes to a per-user Kafka partition (or a queue keyed by user) for incoming messages.
- Delivers to the WebSocket socket, awaits ack, marks the message delivered in the message store.
Connection density: tuned Linux can hold 500 K idle WebSockets per box (mostly memory: ~10 KB/conn). Use SO_REUSEPORT for multi-worker accept, set somaxconn high, raise fs.file-max, and use lightweight in-process protocols (custom binary) over JSON to save bytes per heartbeat.
Message Storage: Cassandra Wide-row
The hot read pattern is "give me the last 50 messages in this conversation". The hot write pattern is "append a message to a conversation". Both map perfectly to a Cassandra wide row:
messages_by_conversation partition key = conversation_id, clustering key = created_at DESC, message_id. A SELECT with LIMIT 50 reads exactly one partition. Append is one write to the partition's tail.
For per-user inbox views (read receipts, last-read pointer), maintain a separate user_conversations table partitioned by user_id, sorted by last_message_at DESC, with denormalized last-message snippet. This is the table that backs the conversation list screen.
See Cassandra design for the wide-row trade-offs — especially the partition-size cap (200 MB recommended; 1 conversation cannot exceed it).
Online / Offline Presence
Presence is "is user X online right now?" served at hundreds of thousands of QPS. Implement:
- Heartbeat: client sends
PINGevery 30 s. Gateway updatespresence:user_id → lastSeenin Redis with TTL 60 s. - Read:
GET presence:user_idreturns lastSeen; client compares to wall clock. - Subscriptions: clients in a chat want notifications when the peer comes online. Use Redis pub/sub or a dedicated stream so the gateway can push status changes without polling.
Typing indicators are presence's noisy cousin: throttle to 1 update per 3 s, never persist, send only to peers actively viewing the chat.
Read Receipts
Three states (sent, delivered, read), each a separate event:
- Sent: server accepted the message.
- Delivered: peer's device received it (gateway emits ack to the message bus).
- Read: peer's app foregrounded the conversation past this message.
For 1:1 chat, store the highest-seen message_id per (user, conversation). For groups, store one per peer per conversation; the read row count gives the "seen by N" indicator.
Privacy toggles: many users disable read receipts. The server still receives the event but withholds it from the sender; never make the toggle client-only or the privacy guarantee leaks.
Group Chats
1:1 chat is fan-out of 1; groups are fan-out of N. Two storage approaches:
- Fan-out on write — insert N rows (one per recipient's inbox). Read is O(1); write is O(N). Works up to ~256-member groups.
- Fan-out on read — one canonical message in
messages_by_conversation, each user pulls from the partition. Write is O(1); read is more involved (last-read pointer per user). Used for large groups and broadcast lists.
WhatsApp draws the line at 1024 members; Telegram supports 200 K via channels (broadcast-only, not chat). The architecture diverges at scale: a 200 K-member channel is not a "group chat" but a publishing platform built on top.
End-to-End Encryption: Signal Protocol
The Signal protocol (also used by WhatsApp, Messenger, Google Messages) gives:
- Forward secrecy — compromise today does not expose past messages.
- Future secrecy — compromise today does not expose tomorrow's messages once a new ratchet step happens.
- Deniability — no signature binds Alice's identity to the ciphertext.
It combines an X3DH (Extended Triple Diffie-Hellman) handshake with the Double Ratchet (DH ratchet on each message exchange + symmetric KDF chain for chained messages). Group chat uses a sender keys scheme: each sender derives a sender-key chain that members fetch once and apply per message. Server stores prekey bundles so offline initiation works.
Implications for the system: the server cannot read messages, so it cannot search, cannot do server-side keyword filtering, cannot do classic abuse moderation. Replacement strategies: client-side reporting, metadata-based abuse signals, content-hash matching (CSAM detection on ciphertext-aware schemes).
WhatsApp vs Telegram Architectures
- WhatsApp: Erlang/OTP gateway (now C++), MQTT/XMPP-derived protocol, Signal-protocol E2EE by default for chats, Mnesia + Cassandra. Famous for tiny ops team running 1B+ users.
- Telegram: custom MTProto protocol, server-side encryption by default (only "Secret Chats" are E2EE), Erlang/Java gateway, custom storage. Trades E2EE-by-default for cloud sync (you log in and see history on a new device without copying keys).
Cross-device sync is the design fork: WhatsApp's E2EE-by-default makes multi-device hard (Signal's sender-keys need a key transfer ceremony); Telegram's server-encrypted-by-default makes multi-device trivial. Pick your tradeoff explicitly.
Failure Modes
- Gateway flap — node restart bounces 500 K connections; clients reconnect simultaneously. Stagger reconnect via exponential backoff with jitter on the client.
- Mobile battery drain — aggressive heartbeats kill battery. Cooperate with OS doze modes; FCM/APNs as the wakeup channel for background delivery.
- Hot conversation — viral group with 1024 members all replying. Cassandra partition gets hot; rate-limit per-conversation writes.
- Lost ack — client got the message but the server thinks not. Idempotent message_ids and per-conversation seq let the client report "I have up to seq=N"; the server replays gaps.
FAQ
Why MQTT over WebSocket?
MQTT has tighter framing for tiny packets, persistent sessions across reconnect, and QoS levels. On flaky mobile, MQTT delivers what WebSocket would lose. WebSocket is fine for desktop/web; MQTT shines on mobile.
How do you do search in an E2EE chat?
Client-side. The client maintains an index over locally decrypted messages. Server-side search is impossible without breaking E2EE.
What about message ordering?
Per-conversation Lamport-style sequence stamped at the server. Across conversations, no global order — users do not notice.
How do voice and video calls fit?
Different stack: SIP/WebRTC for media, signaling over the chat connection. Media uses TURN servers when peer-to-peer fails; the chat layer only carries call setup and invite.