Redis Replication
Redis replication is asynchronous, master-to-replica, with optional partial resync and
diskless full-sync. A replica that connects fresh receives an RDB snapshot streamed by a
forked child (or directly through a pipe), then a continuous stream of every write the
master applies. The protocol — PSYNC — supports resuming from a known offset
after brief disconnects via the master's replication backlog, avoiding the
cost of re-shipping the entire dataset on every blip. This page walks through the handshake,
the steady-state stream, the partial-resync algorithm, and the operational knobs that
determine durability versus throughput.
Replication Handshake (PSYNC)
Key Numbers
Why Async Replication
The Replication Backlog
A circular buffer of recent stream bytes, the heart of partial resync.
The master maintains a fixed-size ring buffer (repl-backlog-size, default 1 MB)
containing the most recent bytes of the replication stream. Every write that gets propagated
to replicas is also appended to the backlog, with a counter tracking the absolute offset.
When a replica reconnects after a disconnect, it sends:
PSYNC <master-runID> <last-applied-offset>
The master checks: do I still have offset N in my backlog? If yes, reply
+CONTINUE and stream from offset N onward. If no (the requested offset has
rolled out of the ring buffer), reply +FULLRESYNC and start a full RDB sync. The
runID also matches: a master that's been restarted has a new runID, so even if offsets
coincidentally match, the protocol falls back to full sync.
Sizing the backlog is a tradeoff. Too small and brief disconnects trigger full RDB syncs —
expensive on big datasets. Too large and you waste RAM. Rule of thumb: pick a size such
that the backlog covers your expected disconnect duration at peak write rate. If you write
10 MB/s and want to tolerate 60s disconnects, set repl-backlog-size 600mb.
Full Sync via RDB
When partial fails, the master ships an RDB snapshot.
Full sync flow:
1. master receives PSYNC requiring full resync 2. master forks a child to generate RDB 3. while child runs, master buffers writes in a per-replica output buffer 4. child finishes RDB; master streams RDB to replica via socket 5. master streams the buffered writes 6. master continues streaming new writes in real time
Step 3 is critical. If the per-replica output buffer overflows
(client-output-buffer-limit replica, default 256mb hard / 64mb soft for 60s),
the master kicks the replica off — and it has to start the full sync from scratch. On a busy
master with a slow replica, this can loop indefinitely. Tune the limit higher for slow links.
Diskless Replication
repl-diskless-sync yes streams RDB directly from the child to the socket.
Classic full sync writes RDB to disk first then sends it — the master pays for disk I/O it
otherwise wouldn't. Diskless replication, default yes since 6.0, uses the
forked child's pipe to stream RDB bytes straight to the replica's socket. No file written.
Tunable: repl-diskless-sync-delay (default 5s) waits for additional replicas
to join before kicking off the fork, so multiple replicas share one stream. Useful when
replicas tend to reconnect in waves (post-failover, post-deploy).
The replica side has its own switch: repl-diskless-load on-empty-db (default)
uses an in-memory load when the replica's keyspace is empty, otherwise falls back to a temp
file. disabled always uses temp file; swapdb always uses the
in-memory load with a swap-databases atomic switch.
Streaming Steady State
After handshake, every write is forwarded.
Once a replica is in online state, the master writes every command (in RESP
format) to the replica's socket buffer. Writes are flushed at the end of each command-batch
processing tick. The replica reads, parses, and applies each command exactly as if a normal
client had sent it — same code path as command execution.
Replicas send REPLCONF ACK <offset> back to the master every second so the
master knows how far each replica has caught up. This drives min-slaves-to-write
and the WAIT command's blocking behavior.
WAIT N timeout-ms blocks the calling client until at least N replicas have
ack'd the write, or the timeout fires. It's a per-write durability primitive layered on
top of async replication: useful for high-importance writes (charge a credit card, then
WAIT 2 1000) without paying the latency cost on every operation.
min-slaves-to-write Durability
A soft synchronous guarantee for partition scenarios.
The configs min-slaves-to-write N and min-slaves-max-lag M tell the
master: "refuse writes unless at least N replicas have a lag ≤ M seconds." If the master is
partitioned from its replicas, after M seconds the master starts replying with errors to
writes — preventing accepted-then-lost writes that the partition would otherwise produce.
This is not transactional. There's still a window of writes that get accepted before the lag detection kicks in. But for many caching workloads it's good enough — a few seconds of lost writes on partition is acceptable, an indefinite bleed is not.
Replica Chains
replica-of replica-of master, since Redis 4.0.
A replica can itself act as a master to other replicas. The intermediate node receives the master's stream, applies it locally, and forwards it to its own replicas:
master ─→ mid-replica ─→ leaf-replica
fan-out=1 fan-out=N
vs.
master ─→ leaf-replica ×N ← master fanout=N Useful for many-replica deployments where master CPU or NIC becomes the bottleneck of replication fanout. The cost: leaf replicas see double the lag (two replication hops) and a problem at the mid-replica disconnects all leaves. Don't chain unless you've measured that the master is genuinely fanout-limited.
FAQ
What's the difference between PSYNC and SYNC?
SYNC was the original Redis 1.x protocol: replica connects, master generates a full RDB and streams it, then forwards every subsequent command. PSYNC (partial sync) added in 2.8 lets a replica that briefly disconnected resume from the master's replication backlog if it's still in range — avoiding the full RDB. PSYNC2 in Redis 4.0 made offsets survive failover. Today PSYNC is universal; SYNC is just the synonym used when no offset/runID is known yet.
What is the replication backlog?
A circular in-memory buffer on the master holding the most recent N bytes of replication stream. Default repl-backlog-size is 1 MB; raise it for high write rate or unstable networks. When a replica reconnects after a brief disconnect, it sends PSYNC <runID> <offset>; if the requested offset is still in the backlog, the master streams just the missing tail. If it's been overwritten, it falls back to full sync via RDB.
What is min-slaves-to-write actually doing?
When set to N, the master refuses writes if fewer than N replicas have ack'd within min-slaves-max-lag seconds. This is a soft synchronous-replication guarantee: if the master partitions away from its replicas, it stops accepting writes after the lag window expires, preventing accepted-then-lost writes. It's not transactional — there's still a window of accepted writes before lag is detected. For ledger-grade durability, look at Sentinel quorum or Redis Cluster.
When does diskless replication win?
When the master's local disk is slow but the network is fast. Cloud instances with ephemeral SSD often have 100-200 MB/s disk vs 1.25 GB/s network. Without diskless (repl-diskless-sync no), full sync writes the RDB to disk then sends it — ~30s for 5 GB on slow disk. Diskless streams from the forked child directly through a pipe to the socket, saving the disk round-trip. The downside: if the connection drops mid-stream, the next replica that arrives has to wait for a fresh fork.
Can a chain of replicas (replica → replica) work?
Yes, since 4.0. Set replicaof master:port on the intermediate node and another replicaof intermediate:port on the leaf. Replicas relay the stream they receive. This reduces network load on the master when you have many replicas, at the cost of higher replication lag for leaves (master → mid → leaf, two hops instead of one). Useful for cross-region read replicas where the WAN hop should happen once, not N times.
Why is replica-priority interesting?
It's the tiebreaker for which replica gets promoted on failover. Priority 0 means 'never promote me' — useful for analytics replicas you want to stay as replicas. Lower numbers win (with ties broken by replication offset). In Sentinel and Cluster, this is honored automatically. A common pattern: priority 100 for primary failover candidates, priority 200 for secondary candidates, priority 0 for analytics.