🗂️ LSM-Tree & Compaction

The Write-Optimized Storage Engine Behind TiKV

Writes 0

Compactions 0

Write Amp 1.0x

Read Amp 1

📝 MemTable (RAM)

Sorted in-memory buffer

0 / 4 keys

↓

Flush to L0

L0

Unsorted SSTs (fast writes)

↓

Compact

L1

Sorted, non-overlapping

↓

Compact

L2

10x larger, sorted

📜 Activity Log

System ready. Click "Write Key-Value" to begin.

💡 Why LSM-Trees?

Traditional B-trees update data in-place, requiring random disk I/O for every write. LSM-trees batch writes in memory, then flush sequentially — turning random writes into sequential writes. This makes writes 10-100x faster on SSDs and HDDs.

⚖️ The Write Amplification Tradeoff

The catch: data gets rewritten multiple times as it compacts through levels. A single key might be written to L0, then compacted to L1, then to L2... Write amplification is the ratio of bytes written to disk vs. bytes written by the app. Typical values: 10-30x for RocksDB.

🔍 Read Amplification

To read a key, we might check: MemTable → L0 (multiple SSTs!) → L1 → L2. Bloom filters help skip SSTs that definitely don't contain the key. In the worst case, reads check every level — this is read amplification.

🎛️ Tuning Knobs

MemTable Size

Larger = fewer flushes, but more memory and longer recovery

write-buffer-size = 128MB

L0 Trigger

Compact to L1 when L0 has this many SSTs

level0-file-num-compaction-trigger = 4

Level Ratio

Each level is Nx larger than the previous

max-bytes-for-level-multiplier = 10

Compression

Balance CPU vs disk space per level

compression-per-level = [no, no, lz4, lz4, lz4, zstd, zstd]