🎚️ Raft Region Size

The Invisible Lever for Distributed Database Performance

256 MB

✓ Optimal Zone

🐜 Tiny (1 MB) 📦 Standard 🏔️ Massive (10+ GB)

1 MB 48 MB 96 MB 256 MB 1 GB 10 GB 1 PB

⚡ Parallelism

More regions = more leaders = more parallel operations. Like checkout lanes in a store.

📊 Overhead

Each region needs heartbeats, elections, metadata. Too many regions = drowning in coordination.

🔄 Recovery Speed

Regions are the unit of failover. Large regions = slow, risky recovery.

📦 Region Distribution (Your Data)

16 regions | 16 leaders

🚢 Shipping Container Analogy

⏱️ Failover Recovery Time

0s ~2s (fast) 60s+

⚠️ What Breaks at Each Extreme

🔥 High PD CPU/Memory

PD tracks millions of regions, drowning in metadata

🐌 Snapshot Transfers Timeout

Moving GB+ regions over network becomes infeasible

⚡ Leader Election Storms

Too many Raft groups = constant election overhead

🎯 Hotspots Can't Be Isolated

Giant regions can't split to spread load

📡 Heartbeat Traffic Explosion

Every region reports to PD - millions of messages

⏰ Ballooning Failover Times

Recovery takes minutes instead of seconds

🔄 TiFlash Ingestion Churn

Excessive replication fan-out, more managing than analyzing

📉 Low TiFlash CPU Utilization

Too few regions to parallelize analytical scans

🧠 The Mental Model

Region size is not about storage. It's about movement — how fast data can move between nodes, during failures, during rebalancing, during growth.

The best region size is the one that lets your data move as fast as your problems appear.