🔐 Distributed Transactions
Percolator 2PC: How TiDB Commits Across Regions
TiDB uses a modified version of Google's Percolator protocol. Every transaction gets a start timestamp (snapshot isolation) and commits via two-phase commit — first locking all keys, then making them visible atomically. The trick: one key is the primary. If the primary commits, the whole transaction commits.
📡 Transaction Flow
🗃️ Column Families (What's on Disk)
💡 Why a Primary Key?
Crash safety without distributed consensus on every commit. The primary key is the single source of truth: if its Write record exists → committed. If its Lock still exists → either still running or needs cleanup. Secondary keys just point back to the primary. This means after the primary commits, the secondaries can be committed asynchronously — even if TiDB crashes, any reader encountering a secondary lock can resolve it by checking the primary.
⚔️ Conflict Resolution
What happens when two transactions write to the same key?
Transaction A
UPDATE account SET balance = 100 WHERE id = 'alice'
start_ts: —Transaction B
UPDATE account SET balance = 200 WHERE id = 'alice'
start_ts: —Optimistic: locks acquired only at commit time. Conflicts detected late.
🎯 Optimistic (Default pre-4.0)
No locks until commit. Fast if conflicts are rare. But if Txn B tries to commit and finds Txn A's lock → Txn B must retry from scratch. Wasted work. Bad for hot keys.
🔒 Pessimistic (Default 4.0+)
Acquire locks during UPDATE/DELETE, before commit.
If key is locked → wait (with timeout).
Less wasted work, behaves like MySQL. But more lock traffic.
⏳ MVCC: Versions Over Time
Every write creates a new version. Readers see a consistent snapshot.
📊 Latency Breakdown
Where time goes in a distributed transaction
⚡ Async Commit (TiDB 5.0)
Instead of waiting for the primary's Write record, TiDB returns success after all prewrites succeed. The commit_ts is calculated from the max prewrite timestamps. Saves one full RTT — huge for cross-region deployments.
🎯 1PC Optimization
If all keys land in one TiKV region, skip the two-phase dance entirely. Prewrite + Commit in a single RPC. Latency drops to 1 RTT.