🎚️ Raft Region Size

The Invisible Lever for Distributed Database Performance

256 MB
✓ Optimal Zone
🐜 Tiny (1 MB) 📦 Standard 🏔️ Massive (10+ GB)
1 MB 48 MB 96 MB 256 MB 1 GB 10 GB 1 PB

Parallelism

More regions = more leaders = more parallel operations. Like checkout lanes in a store.

📊 Overhead

Each region needs heartbeats, elections, metadata. Too many regions = drowning in coordination.

🔄 Recovery Speed

Regions are the unit of failover. Large regions = slow, risky recovery.

📦 Region Distribution (Your Data)

16 regions | 16 leaders

🚢 Shipping Container Analogy

⏱️ Failover Recovery Time

0s ~2s (fast) 60s+

⚠️ What Breaks at Each Extreme

🔥 High PD CPU/Memory
PD tracks millions of regions, drowning in metadata
🐌 Snapshot Transfers Timeout
Moving GB+ regions over network becomes infeasible
⚡ Leader Election Storms
Too many Raft groups = constant election overhead
🎯 Hotspots Can't Be Isolated
Giant regions can't split to spread load
📡 Heartbeat Traffic Explosion
Every region reports to PD - millions of messages
⏰ Ballooning Failover Times
Recovery takes minutes instead of seconds
🔄 TiFlash Ingestion Churn
Excessive replication fan-out, more managing than analyzing
📉 Low TiFlash CPU Utilization
Too few regions to parallelize analytical scans

🧠 The Mental Model

Region size is not about storage. It's about movement — how fast data can move between nodes, during failures, during rebalancing, during growth.

The best region size is the one that lets your data move as fast as your problems appear.