Architecture

Client A Client B Edge gateway WebSocket / MQTT millions of conns Routing user_id → gateway via Redis presence map Message bus Kafka, partition by conversation Presence service heartbeats, lastSeen, typing indicators Cassandra wide-row by (user_id, conv_id) Push notif APNs/FCM offline delivery

Capacity Estimation

MetricValueNotes
Daily active users~2 BWhatsApp scale
Messages/day~100 B~50 per DAU
Peak send/s~5 M10× daily mean
Concurrent connections~1 Bonline users
Connections per gateway~500 Ktuned Linux box
Gateway count~2 K1 B / 500 K
Storage growth~50 TB/day200 B/msg compressed

WebSocket Fan-out

Long-lived bidirectional connections are the foundation. WebSocket (or MQTT for mobile-battery efficiency) over TLS, kept alive with ~30 s pings. The gateway tier:

  • Accepts the connection, authenticates (JWT or signed cookie), registers the mapping user_id → gateway_id in Redis with a TTL refreshed by heartbeat.
  • Subscribes to a per-user Kafka partition (or a queue keyed by user) for incoming messages.
  • Delivers to the WebSocket socket, awaits ack, marks the message delivered in the message store.

Connection density: tuned Linux can hold 500 K idle WebSockets per box (mostly memory: ~10 KB/conn). Use SO_REUSEPORT for multi-worker accept, set somaxconn high, raise fs.file-max, and use lightweight in-process protocols (custom binary) over JSON to save bytes per heartbeat.

Message Storage: Cassandra Wide-row

The hot read pattern is "give me the last 50 messages in this conversation". The hot write pattern is "append a message to a conversation". Both map perfectly to a Cassandra wide row:

messages_by_conversation partition key = conversation_id, clustering key = created_at DESC, message_id. A SELECT with LIMIT 50 reads exactly one partition. Append is one write to the partition's tail.

For per-user inbox views (read receipts, last-read pointer), maintain a separate user_conversations table partitioned by user_id, sorted by last_message_at DESC, with denormalized last-message snippet. This is the table that backs the conversation list screen.

See Cassandra design for the wide-row trade-offs — especially the partition-size cap (200 MB recommended; 1 conversation cannot exceed it).

Online / Offline Presence

Presence is "is user X online right now?" served at hundreds of thousands of QPS. Implement:

  • Heartbeat: client sends PING every 30 s. Gateway updates presence:user_id → lastSeen in Redis with TTL 60 s.
  • Read: GET presence:user_id returns lastSeen; client compares to wall clock.
  • Subscriptions: clients in a chat want notifications when the peer comes online. Use Redis pub/sub or a dedicated stream so the gateway can push status changes without polling.

Typing indicators are presence's noisy cousin: throttle to 1 update per 3 s, never persist, send only to peers actively viewing the chat.

Read Receipts

Three states (sent, delivered, read), each a separate event:

  • Sent: server accepted the message.
  • Delivered: peer's device received it (gateway emits ack to the message bus).
  • Read: peer's app foregrounded the conversation past this message.

For 1:1 chat, store the highest-seen message_id per (user, conversation). For groups, store one per peer per conversation; the read row count gives the "seen by N" indicator.

Privacy toggles: many users disable read receipts. The server still receives the event but withholds it from the sender; never make the toggle client-only or the privacy guarantee leaks.

Group Chats

1:1 chat is fan-out of 1; groups are fan-out of N. Two storage approaches:

  • Fan-out on write — insert N rows (one per recipient's inbox). Read is O(1); write is O(N). Works up to ~256-member groups.
  • Fan-out on read — one canonical message in messages_by_conversation, each user pulls from the partition. Write is O(1); read is more involved (last-read pointer per user). Used for large groups and broadcast lists.

WhatsApp draws the line at 1024 members; Telegram supports 200 K via channels (broadcast-only, not chat). The architecture diverges at scale: a 200 K-member channel is not a "group chat" but a publishing platform built on top.

End-to-End Encryption: Signal Protocol

The Signal protocol (also used by WhatsApp, Messenger, Google Messages) gives:

  • Forward secrecy — compromise today does not expose past messages.
  • Future secrecy — compromise today does not expose tomorrow's messages once a new ratchet step happens.
  • Deniability — no signature binds Alice's identity to the ciphertext.

It combines an X3DH (Extended Triple Diffie-Hellman) handshake with the Double Ratchet (DH ratchet on each message exchange + symmetric KDF chain for chained messages). Group chat uses a sender keys scheme: each sender derives a sender-key chain that members fetch once and apply per message. Server stores prekey bundles so offline initiation works.

Implications for the system: the server cannot read messages, so it cannot search, cannot do server-side keyword filtering, cannot do classic abuse moderation. Replacement strategies: client-side reporting, metadata-based abuse signals, content-hash matching (CSAM detection on ciphertext-aware schemes).

WhatsApp vs Telegram Architectures

  • WhatsApp: Erlang/OTP gateway (now C++), MQTT/XMPP-derived protocol, Signal-protocol E2EE by default for chats, Mnesia + Cassandra. Famous for tiny ops team running 1B+ users.
  • Telegram: custom MTProto protocol, server-side encryption by default (only "Secret Chats" are E2EE), Erlang/Java gateway, custom storage. Trades E2EE-by-default for cloud sync (you log in and see history on a new device without copying keys).

Cross-device sync is the design fork: WhatsApp's E2EE-by-default makes multi-device hard (Signal's sender-keys need a key transfer ceremony); Telegram's server-encrypted-by-default makes multi-device trivial. Pick your tradeoff explicitly.

Failure Modes

  • Gateway flap — node restart bounces 500 K connections; clients reconnect simultaneously. Stagger reconnect via exponential backoff with jitter on the client.
  • Mobile battery drain — aggressive heartbeats kill battery. Cooperate with OS doze modes; FCM/APNs as the wakeup channel for background delivery.
  • Hot conversation — viral group with 1024 members all replying. Cassandra partition gets hot; rate-limit per-conversation writes.
  • Lost ack — client got the message but the server thinks not. Idempotent message_ids and per-conversation seq let the client report "I have up to seq=N"; the server replays gaps.

FAQ

Why MQTT over WebSocket?

MQTT has tighter framing for tiny packets, persistent sessions across reconnect, and QoS levels. On flaky mobile, MQTT delivers what WebSocket would lose. WebSocket is fine for desktop/web; MQTT shines on mobile.

How do you do search in an E2EE chat?

Client-side. The client maintains an index over locally decrypted messages. Server-side search is impossible without breaking E2EE.

What about message ordering?

Per-conversation Lamport-style sequence stamped at the server. Across conversations, no global order — users do not notice.

How do voice and video calls fit?

Different stack: SIP/WebRTC for media, signaling over the chat connection. Media uses TURN servers when peer-to-peer fails; the chat layer only carries call setup and invite.