Redis Persistence

Redis is a memory-resident database that pretends to be durable. The pretense is held up by two orthogonal mechanisms — RDB, a periodic binary snapshot of the entire keyspace produced by a forked child, and AOF, an append-only log of every write command since the last snapshot. Each has its own failure mode and recovery profile, and modern deployments typically use both simultaneously through the hybrid aof-use-rdb-preamble format. This page walks through how each format is structured on disk, what the fsync knobs actually buy you, how AOF rewrite avoids unbounded log growth, and how persistence interacts with replication.

Persistence Pipeline

Key Numbers

Default fsync

everysec

Window of loss

≤1s on crash

Default RDB save

3600s/1, 300s/100, 60s/10000

AOF rewrite trigger

100% growth

Min rewrite size

64 MB

RDB magic

REDIS0011

Fork cost

~10µs / GB RSS

Why Two Formats Exist

RDB: cheap & fast

A 100 GB Redis loads from RDB in 30-90 seconds because the format is essentially a pre-serialized memcpy of every value. AOF replay is command-by-command and 5-10x slower. RDB is also far more compact: a single integer key in RDB is ~6 bytes, in AOF it's the full SET key value\\r\\n protocol.

AOF: minimal data loss

RDB by default snapshots only every few minutes. If the host dies between snapshots, all writes since the last save are gone. AOF with everysec caps loss at one second of writes — usually acceptable for cache-with-state workloads.

Hybrid: both wins

Since Redis 4.0, the AOF file can begin with a binary RDB section followed by command-format entries for everything since. Restart loads the RDB chunk fast, then replays only the recent tail. Get RDB-fast restart with AOF-bounded loss simultaneously.

RDB: The Snapshot Format

A point-in-time binary dump of every database, written by a forked child.

RDB is triggered by the SAVE command (synchronous, blocks the main thread — almost never used in production), BGSAVE (asynchronous, forks a child), the save directive in redis.conf, or replication. The default save policy is the famous three-rule cascade:

save 3600 1       # after 3600s if at least 1 key changed
save 300 100      # after 300s if at least 100 keys changed
save 60 10000     # after 60s if at least 10000 keys changed

The semantics: a BGSAVE is triggered when any rule's time window has elapsed and that rule's threshold of changes has been met. So a busy server saves every minute; a quiet one saves once an hour. Each successful save resets the change counter.

The on-disk format starts with the magic REDIS followed by a 4-byte ASCII version number (0011 at the time of writing — the format is incremented when new value encodings or metadata are added, and recent Redis can read older RDBs but not vice versa). Then comes a stream of length-prefixed records:

REDIS0011                        magic + version
FA  redis-ver 7.2.4      AUX field: server version
FA  redis-bits 64        AUX field: pointer width
FA  ctime     AUX field: time of dump
FA  used-mem      AUX field: dataset size hint
FE 00                            SELECTDB: database 0
FB            RESIZEDB: pre-size the dict
00                   STRING (type 0)
01          LIST (encoded variant)
04          HASH
FD         key with EXPIRE
FF                        end of file + checksum

Notice that integer string values are stored using a compact varint encoding rather than ASCII, list and hash values use the same listpack encoding they have in memory (Redis can memcpy the in-memory representation directly into the RDB for many encodings), and the file ends with a CRC64 of all preceding bytes. On load, a non-matching CRC aborts with a clear error unless rdbchecksum no was set.

Because the dump is produced by a forked child walking the parent's memory, every page touched during the walk that the parent later writes to gets COW-duplicated. A 50 GB Redis taking 30s to dump under a 10 MB/s write rate adds about 300 MB of COW pages. The fork(2) latency itself is the bigger concern: on Linux without huge pages, kernels copy 8 bytes of page table per 4 KB of RSS, so a 100 GB instance is around 200 MB of page table — fork takes 100-300 ms. That latency lands on the main thread as a blip in p99.

AOF: The Append-Only File

Every write command, serialized in the RESP protocol, appended to a file.

AOF logs every command that mutates the dataset in the same RESP wire protocol used between client and server. Reading an AOF file is identical to replaying a session of writes. A single SET foo bar appends:

*3\r\n
$3\r\nSET\r\n
$3\r\nfoo\r\n
$3\r\nbar\r\n

That's 24 bytes for a 6-byte logical write — AOF is verbose. For pure-numeric or counter workloads it's far less efficient than RDB, but the upside is that recovery is replay-by-replay: any Redis client can read the file. This makes ad-hoc debugging easy and forms the basis of cross-version migrations (dump AOF on old Redis, replay on new).

The fsync policy is set with appendfsync:

appendfsync always     # fsync after every command — ~5-50x throughput cut
appendfsync everysec   # fsync once per second from a bg thread (default)
appendfsync no         # never fsync, leave to kernel writeback (~30s)

Under everysec, the main thread writes commands into the kernel page cache via write(2) on every command — this is fast, microseconds of latency. A separate background I/O thread (bio_aof_fsync) wakes once a second and runs fsync(2) on the AOF file descriptor, flushing the page cache to disk. If fsync is still running when the next second arrives, the main thread will block on the next write if the page cache is too dirty (a kernel-imposed back-pressure). This is why slow disks under heavy write load cause Redis latency spikes despite the "async" promise of everysec — the kernel will eventually serialize.

appendfsync always is rarely justified. It cuts write throughput to whatever your disk's sync IOPS can sustain — on cloud network-attached disks, often 1000-5000 fsyncs per second. Workloads that genuinely need it usually want a synchronous-replica setup instead, where the durability point is "two replicas have it" rather than "one disk fsynced it".

AOF Rewrite (Compaction)

Without rewrite, AOF grows unbounded. Rewrite forks and emits a fresh, compacted file.

A million SET on the same key produces a million entries in AOF, even though the only one that matters is the last. Rewrite collapses this. It is triggered automatically by auto-aof-rewrite-percentage 100 and auto-aof-rewrite-min-size 64mb — meaning when AOF doubles in size since the last rewrite and exceeds 64 MB. It can also be triggered manually with BGREWRITEAOF.

Rewrite is implemented exactly like BGSAVE: fork a child, walk the in-memory dataset, emit one minimal command sequence per key reproducing its current state, write that to a temp file. The child does not read the existing AOF — that would be much slower and require conflict resolution with concurrent writes. While the child writes, the parent keeps appending real-time mutations to two places: the original AOF (for crash safety in case the rewrite fails) and a separate aof-rewrite-buf (an in-memory buffer of post-fork commands).

When the child finishes, it signals the parent. The parent appends the rewrite-buf contents to the temp file, fsyncs it, and renames it onto appendonly.aof. The atomic rename via rename(2) is what makes the swap safe — the file descriptor is replaced in one POSIX operation. The old AOF file is unlinked, freeing disk.

Common failure mode: the rewrite buffer grows faster than the child can finish. On a server under sustained write load with a slow disk, the buffer can balloon to gigabytes before the child writes the snapshot. Watch aof_rewrite_buffer_length in INFO Persistence — if it's growing toward your free RAM, you may need to slow the workload, faster disks, or tune no-appendfsync-on-rewrite yes (which disables fsync on the live AOF during rewrite, sacrificing durability for throughput just during that window).

The Hybrid Format (RDB Preamble + AOF Tail)

Introduced in Redis 4.0, this is what production should use.

With aof-use-rdb-preamble yes (default since 4.0), the AOF rewrite child produces a binary RDB chunk as the start of the new AOF file, then appends commands collected during the rewrite. The format on disk is:

REDIS0011 ...rdb bytes... FF      ← rewrite output (binary)
*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\nbar\r\n     ← new commands (text)
*3\r\n$5\r\nLPUSH\r\n$1\r\nq\r\n$1\r\nx\r\n     ← appended forever after

On load, Redis detects the REDIS magic at byte zero, loads the RDB section (fast — same as a normal RDB load), then switches to RESP-stream mode for the rest. This is the best of both: the bulk of the dataset is in compact binary (loads in seconds, not minutes), and only the post-rewrite tail is verbose RESP (small, by definition — it's the writes that have happened since the last rewrite).

Backward compatibility: a Redis 3.x cannot load a hybrid AOF — it sees the RDB magic and bails. If you need to downgrade, set aof-use-rdb-preamble no, run BGREWRITEAOF, and the resulting file is pure RESP and loadable anywhere.

Persistence and Replication

A surprising amount of replication machinery is just persistence in motion.

When a replica connects to a master for full sync (PSYNC ? -1), the master generates an RDB snapshot and streams it to the replica. There are two variants. The classic flow writes RDB to the master's local disk first, then sends it; this works on any kernel but is twice the I/O. Diskless replication (repl-diskless-sync yes) streams RDB bytes directly from the forked child's pipe over the socket to the replica, skipping the master's disk entirely.

Diskless is unambiguously better for masters with slow disks but fast networks (cloud ephemeral disk vs 10 Gbps NIC), and the default became yes in Redis 6.0. The catch: if multiple replicas are syncing at once, they share the same RDB stream — this is the repl-diskless-sync-delay feature, which waits a configurable number of seconds for additional replicas to join before kicking off the fork.

Replicas typically don't persist by default — save "" in the replica config disables RDB, and appendonly no disables AOF. The reasoning: the master is the durability point; if the master fails, you fail over to a replica that already has the data in RAM, and you take a fresh snapshot then. Persisting on every replica multiplies disk cost without proportional benefit. However, if you want a replica to survive a host reboot without re-syncing GBs from the master, enable persistence on it — recovery from local AOF is much faster than network sync.

Operational Tradeoffs

Aspect	RDB only	AOF only	Hybrid (recommended)
Restart speed (100 GB)	30-90 s	10-30 min	30-90 s + small tail
Worst-case data loss	save interval (minutes)	~1 s (everysec)	~1 s
File size	compact (binary)	verbose (RESP × N)	compact + small tail
Fork frequency	per save trigger	per rewrite trigger	per rewrite trigger
Disk write rate	spiky (per save)	continuous + spike	continuous + spike
Cross-version portable	major-version only	any version	any version (after rewrite)

FAQ

RDB vs AOF — which one should I pick?

Run both. RDB gives you cheap, point-in-time snapshots that compress well and load fast on restart; AOF gives you near-zero-data-loss durability if you set appendfsync=everysec. The mixed RDB+AOF format introduced in Redis 4.0 (aof-use-rdb-preamble yes) writes a binary RDB chunk at the start of the AOF file followed by an incremental log of subsequent commands — best of both. The only reason to pick one is operational simplicity: if you genuinely don't need durability (a cache fronting a database) RDB alone is fine; if you cannot tolerate even a few seconds of data loss and don't care about restart speed, pure AOF works.

What does appendfsync=everysec actually risk?

Up to one second of acknowledged writes lost on a kernel/host crash. The Redis main thread appends each command to the AOF buffer synchronously, and a background thread calls fsync(2) once per second. If the host loses power between fsyncs, the buffered writes — already acknowledged to clients — are gone. appendfsync=always issues an fsync after every write command, which is durable but cuts throughput by 5-50x depending on disk; appendfsync=no leaves fsync entirely to the kernel (typically 30s on Linux), which is fast but loses far more on crash.

When does AOF rewrite trigger and what does it cost?

Automatically when the AOF file has grown by auto-aof-rewrite-percentage (default 100%) over its size at the last rewrite, and exceeds auto-aof-rewrite-min-size (default 64 MB). The rewrite forks a child that snapshots the in-memory dataset to a new AOF file. While the child runs, the parent keeps appending new commands to a buffer; when the child finishes, the buffered commands are flushed to the new file and it atomically replaces the old. The fork uses copy-on-write, so memory overhead is bounded by the write rate during the rewrite window.

Does RDB block the main thread?

Only for the fork(2) call. After fork, the child is a separate process that walks the in-memory dataset and writes RDB to disk while the parent continues serving requests. The fork itself is the expensive part: on a 100 GB instance, fork can take 100-500ms depending on huge-page settings and kernel version. Use the latency-monitor or check INFO Persistence rdb_last_bgsave_time_sec / latest_fork_usec. Disable Linux transparent huge pages (echo never > /sys/kernel/mm/transparent_hugepage/enabled) — they make fork much slower.

What happens to persistence on a replica?

By default, replicas don't persist. They receive an RDB snapshot during initial sync, replay the replication stream into memory, and serve reads — but on restart they re-sync from the master. Set save (RDB) and appendonly yes on replicas if you want them to survive a master failure with their own snapshot. Important: if you're using diskless replication (repl-diskless-sync yes), the master streams the RDB over the socket without writing it to its own disk, but the replica still has to load it.

How do I recover from a partially-written AOF?

Redis ships with redis-check-aof. Running redis-check-aof --fix /path/to/appendonly.aof will scan the file, find the last fully-formed command, and truncate the trailing partial. The truncated tail is saved to .bak. Most production failures look like this: the host crashed mid-write, so the last few KB are garbage. The fix recovers everything before the corruption. If aof-load-truncated yes (the default), Redis does this automatically on startup — it warns and continues.