🍃 Sharding & Chunks
How MongoDB Distributes Data — and How to Avoid Hotspots
MongoDB shards data by splitting collections into chunks (64MB by default), each covering a range of the shard key. The balancer moves chunks between shards to keep data even. The shard key you choose determines everything — good keys distribute writes evenly; bad keys create hotspots that kill performance.
🔑 Shard Key Simulator
Watch how different shard key strategies distribute writes across shards
🚨 Monotonic Keys (ObjectId, Timestamp)
New documents always go to the last chunk — creating a single-shard hotspot. The balancer can't help because writes keep piling onto one shard. This is the #1 sharding mistake.
✅ Hashed Shard Keys
MongoDB hashes the key value before routing — perfect distribution, no hotspots. Downside: range queries become scatter-gather (must hit all shards).
✅ Compound Keys (region + timestamp)
A high-cardinality prefix (like region or user_id) distributes chunks across shards, while the timestamp suffix keeps data sorted within each prefix. Best of both worlds.
✅ Random Keys (UUID)
UUIDs distribute evenly. Similar to hashed, but you can still do range queries on the key itself. Watch out for index fragmentation — UUIDs are random, so B-tree pages fill unevenly.
⚖️ Chunk Splitting & Balancing
When chunks grow beyond 64MB, they split. When shards are uneven, the balancer migrates chunks.
🛠️ Operational Best Practices
📊 Monitor Chunk Distribution
db.collection.getShardDistribution() Run regularly. If one shard has 2x+ the chunks of others, investigate.
⏰ Balancer Window
sh.setBalancerWindow("02:00", "06:00") Chunk migrations are I/O heavy. Schedule during off-peak hours.
📏 Jumbo Chunks
sh.splitAt('db.coll', { key: splitPoint }) Chunks that can't split (low cardinality key) become "jumbo" — unmovable. Manual split needed.
🔍 Scatter-Gather Detection
db.coll.find({...}).explain('executionStats') If "nShards" equals total shards, your query isn't using the shard key. Add it to the filter.
🏷️ Zone Sharding
sh.addShardToZone("shard0", "US-EAST") Pin data to specific shards by region for compliance or latency. Chunks stay in their zone.
⚠️ Pre-Split Before Bulk Load
sh.splitAt('db.coll', { _id: ... }) Before large imports, pre-split chunks to avoid overwhelming one shard. Critical for initial loads.