🍃 Sharding & Chunks

How MongoDB Distributes Data — and How to Avoid Hotspots

MongoDB shards data by splitting collections into chunks (64MB by default), each covering a range of the shard key. The balancer moves chunks between shards to keep data even. The shard key you choose determines everything — good keys distribute writes evenly; bad keys create hotspots that kill performance.

🔑 Shard Key Simulator

Watch how different shard key strategies distribute writes across shards

Shard Key Strategy:

Shard 0

0 docs

Shard 1

0 docs

Shard 2

0 docs

Total Docs 0

Imbalance 0%

Hotspot? No

Balancer Moves 0

Ready. Choose a shard key strategy and start inserting.

🚨 Monotonic Keys (ObjectId, Timestamp)

New documents always go to the last chunk — creating a single-shard hotspot. The balancer can't help because writes keep piling onto one shard. This is the #1 sharding mistake.

❌ Avoid for write-heavy collections

✅ Hashed Shard Keys

MongoDB hashes the key value before routing — perfect distribution, no hotspots. Downside: range queries become scatter-gather (must hit all shards).

✅ Great for write-heavy, point-query workloads

✅ Compound Keys (region + timestamp)

A high-cardinality prefix (like region or user_id) distributes chunks across shards, while the timestamp suffix keeps data sorted within each prefix. Best of both worlds.

✅ Best for mixed read/write workloads

✅ Random Keys (UUID)

UUIDs distribute evenly. Similar to hashed, but you can still do range queries on the key itself. Watch out for index fragmentation — UUIDs are random, so B-tree pages fill unevenly.

✅ Good distribution, watch index size

⚖️ Chunk Splitting & Balancing

When chunks grow beyond 64MB, they split. When shards are uneven, the balancer migrates chunks.

Initial

[MinKey, MaxKey] → Shard 0

🛠️ Operational Best Practices

📊 Monitor Chunk Distribution

db.collection.getShardDistribution()

Run regularly. If one shard has 2x+ the chunks of others, investigate.

⏰ Balancer Window

sh.setBalancerWindow("02:00", "06:00")

Chunk migrations are I/O heavy. Schedule during off-peak hours.

📏 Jumbo Chunks

sh.splitAt('db.coll', { key: splitPoint })

Chunks that can't split (low cardinality key) become "jumbo" — unmovable. Manual split needed.

🔍 Scatter-Gather Detection

db.coll.find({...}).explain('executionStats')

If "nShards" equals total shards, your query isn't using the shard key. Add it to the filter.

🏷️ Zone Sharding

sh.addShardToZone("shard0", "US-EAST")

Pin data to specific shards by region for compliance or latency. Chunks stay in their zone.

⚠️ Pre-Split Before Bulk Load

sh.splitAt('db.coll', { _id: ... })

Before large imports, pre-split chunks to avoid overwhelming one shard. Critical for initial loads.