Database Sharding

Horizontal Partitioning Strategies, Shard Key Selection, and Scalable Data Distribution

Sharding splits a database into horizontal partitions called shards, each stored on a separate node. The goal: scale writes, storage, and reads beyond what a single database can handle β€” typically past 1M rows/sec or 10TB. The core challenge: choosing a shard key that evenly distributes data, minimizes cross-shard operations, and survives node additions without massive reshuffling. Get it wrong and you'll create hotspots, rebalancing storms, or impossible JOINs. Get it right and you can scale linearly.

When Do You Need Sharding?

Enter your numbers to see if sharding is warranted β€” and what capacity each shard needs.

Total storageβ€”
Raw writes/secβ€”
Shards needed (write)β€”
Shards needed (storage)β€”
Total nodes (with RF=3)β€”
Shard recommendationβ€”

Shard Key Strategies

The shard key determines how data is distributed. Pick the wrong one and you'll create bottlenecks.

Hash-Based

Even Distribution

hash(key) mod N β€” maps keys uniformly across shards. Best for: random access patterns, user IDs, tenant IDs.

shard = hash(user_id) % num_shards
βœ“ Even distribution   βœ“ Simple
βœ— Range scans span all shards   βœ— Resharding requires full remap

Range-Based

Query-Friendly

Shard by ranges of the key β€” e.g., A-E on shard 1, F-J on shard 2. Best for: time-series, geographic data.

shard = getRangeShard(timestamp)
βœ“ Efficient range scans   βœ“ Easy to understand
βœ— Hotspots (newest partition)   βœ— Uneven if access is skewed

Directory-Based

Most Flexible

Lookup table maps keys β†’ shards. Best for: multi-tenant SaaS, mixed access patterns, gradual migration.

shard = lookup_table.get(shard_key)
βœ“ No key remapping on add   βœ“ Flexible routing
βœ— Lookup overhead   βœ— Lookup table becomes SPOF

Consistent Hashing Ring

Add or remove nodes and watch how consistent hashing minimizes key remapping β€” compared to naive modulo.

Naive Modulo (hash % N)

β€”

Consistent Hashing

β€”

Resharding: Adding a Node

When you add a shard, naive modulo reassigns most keys. Consistent hashing reassigns only ~1/N.

Before (4 shards)

After adding shard 5

Naive % keys moved β€”
Consistent Hashing % moved β€”
Keys saved from remapping β€”

Cross-Shard Query Patterns

Some queries must touch multiple shards. Choose the right strategy based on your access pattern.

Pros

    Cons

      Hot Key Mitigation

      A single popular shard key (e.g., celebrity user) can overwhelm one shard. Here's how to handle it.

      πŸ”€ Key Splitting

      user:1234 β†’ user:1234:0, user:1234:1, user:1234:2

      When: one key has >20% of traffic

      πŸ“‹ Local Cache

      Redis L1 cache on each shard node for hot keys

      When: read-heavy hot keys

      πŸ—³οΈ Replica Read

      Route hot key reads to read replicas, writes to primary

      βœ— Replication lag may show stale reads

      πŸ§‚ Salting

      Add random suffix to distribute load: user:1234:salt β†’ shard N

      When: hot key pattern is predictable

      Shard Key Decision Framework

      πŸ“Š Query Pattern First

      Identify your top 3 most common queries. Can they be answered by a single shard? If yes, use a shard key aligned to that query. If all queries are cross-shard, lean toward denormalization.

      βš–οΈ Write Distribution

      Will writes hit a few keys heavily (time-series, social media)? Use hash-based for even write distribution. Writes uniformly distributed? Range-based works fine.

      πŸ”’ Cardinality

      Shard key must have high cardinality β€” thousands+ of distinct values. User ID is good; Country code (200 values) is not. Low cardinality = few large shards.

      πŸ›‘οΈ Resharding Cost

      Plan for 2Γ— expected growth to avoid resharding in <12 months. If resharding is likely, use directory-based or consistent hashing from day one.

      βœ… Golden Rule

      If you can't answer "which shard does this query hit?" before executing, your shard key needs work. Every unnecessary scatter-gather query multiplies latency by the number of shards.

      Frequently Asked Questions

      When should I shard vs. just adding read replicas?

      Add replicas first. Replicas handle read scaling, improve availability, and require zero application changes. Shard when: (1) you hit write throughput limits, (2) storage exceeds a single node, or (3) p99 latency spikes under write load. Typically this is 5–50M+ rows/day or 1TB+ of hot data.

      What happens to foreign keys and JOINs?

      Foreign keys don't automatically span shards. You have three options: denormalize (duplicate data for co-location), re-published queries (run sub-query on each shard and merge), or application-level joins (fetch from each shard and join in your service layer). Denormalization is most common in sharded systems.

      Can I shard without a middleware proxy?

      Yes. Many databases (MongoDB, Cassandra, CockroachDB) handle shard routing natively. Proxy-based sharding (Vitess, CockroachDB, Spanner) adds a routing layer that your app talks to like a single database. Direct sharding (CockroachDB, YugabyteDB) handles it transparently. Proxy adds latency but gives you full control; native sharding is simpler but less flexible.

      How do I choose between hash and range sharding?

      Hash sharding when: queries are random-access by ID, writes are uniformly distributed, you want even load. Range sharding when: queries naturally cluster (time ranges, alphabetical ranges, geographic regions), you frequently scan contiguous ranges, and data distribution is naturally chunkable.

      What's the ideal number of shards?

      Design for 2Γ— your expected 18-month growth. Start with enough shards that each can handle your peak load with 50% headroom. For most mid-size systems: 4–16 shards is a good range. Too few shards = insufficient scale. Too many = coordination overhead, more cross-shard queries, harder operations. You can always add shards with consistent hashing.

      How do I do cross-shard transactions?

      Distributed transactions (2-phase commit, SAGA pattern) are expensive and slow. The sharding philosophy prefers eventual consistency: writes go to one shard, async events propagate to others. For true ACID across shards, use a distributed transaction coordinator (Vitess, CockroachDB Google Spanner). Prefer single-shard transactions + async reconciliation wherever possible.