🔐 Distributed Transactions

Percolator 2PC: How TiDB Commits Across Regions

TiDB uses a modified version of Google's Percolator protocol. Every transaction gets a start timestamp (snapshot isolation) and commits via two-phase commit — first locking all keys, then making them visible atomically. The trick: one key is the primary. If the primary commits, the whole transaction commits.

📡 Transaction Flow

Ready Click "Start Transaction" to begin
📱
Client / TiDB
PD (TSO)
📦
TiKV Region 1
📦
TiKV Region 2

🗃️ Column Families (What's on Disk)

🔒 Lock CF
Empty
📄 Default CF (Data)
Empty
✅ Write CF
Empty

💡 Why a Primary Key?

Crash safety without distributed consensus on every commit. The primary key is the single source of truth: if its Write record exists → committed. If its Lock still exists → either still running or needs cleanup. Secondary keys just point back to the primary. This means after the primary commits, the secondaries can be committed asynchronously — even if TiDB crashes, any reader encountering a secondary lock can resolve it by checking the primary.

⚔️ Conflict Resolution

What happens when two transactions write to the same key?

Transaction A

UPDATE account SET balance = 100 WHERE id = 'alice'

start_ts: —

Transaction B

UPDATE account SET balance = 200 WHERE id = 'alice'

start_ts: —

Optimistic: locks acquired only at commit time. Conflicts detected late.

🎯 Optimistic (Default pre-4.0)

No locks until commit. Fast if conflicts are rare. But if Txn B tries to commit and finds Txn A's lock → Txn B must retry from scratch. Wasted work. Bad for hot keys.

🔒 Pessimistic (Default 4.0+)

Acquire locks during UPDATE/DELETE, before commit. If key is locked → wait (with timeout). Less wasted work, behaves like MySQL. But more lock traffic.

⏳ MVCC: Versions Over Time

Every write creates a new version. Readers see a consistent snapshot.

📊 Latency Breakdown

Where time goes in a distributed transaction

Total: 6 ms

⚡ Async Commit (TiDB 5.0)

Instead of waiting for the primary's Write record, TiDB returns success after all prewrites succeed. The commit_ts is calculated from the max prewrite timestamps. Saves one full RTT — huge for cross-region deployments.

🎯 1PC Optimization

If all keys land in one TiKV region, skip the two-phase dance entirely. Prewrite + Commit in a single RPC. Latency drops to 1 RTT.