Design a File Storage Service (Dropbox-style)
A consumer file-sync service like Dropbox or Google Drive uploads files from a user's device, deduplicates and stores them in object storage, and synchronizes changes to every other device the user logged in on. The hard problems are efficient sync (don't upload the whole file when 2 KB changed), conflict resolution (two devices edited offline), version history, and sharing semantics (a folder shared with 100 people becomes a fan-out problem).
Architecture
Capacity Estimation
| Metric | Value | Notes |
|---|---|---|
| Users | ~700 M | Dropbox + Drive scale |
| Avg storage / user | ~5 GB | free tier dominant |
| Total storage | ~3 EB | before dedup |
| Dedup ratio | ~30% | shared content |
| Daily uploads | ~5 PB | 1% turnover |
| Block size | 4 MB | tradeoff dedup/overhead |
| Metadata rows | ~50 B | per-block + per-file |
Chunking and Dedup
Files are split into fixed-size 4 MB blocks. Each block's SHA-256 hash is its address; the object store is content-addressed. Two devices uploading the same file (or the same intro to a video) deduplicate: the server stores one copy.
Fixed-size chunking is simple but vulnerable to the "byte insertion" problem: insert one byte at offset 0, every subsequent block shifts and changes hash, no dedup. Mitigation: content-defined chunking (Rabin fingerprinting) places block boundaries based on rolling hash, so insertion only changes the affected block. Used by rsync, Borg backup, and dedup file systems.
Most consumer services pick fixed 4 MB for simplicity; the byte-insertion case is rare in user content (people don't prepend to video files). Backup-focused services use rolling hash.
Sync (Delta Sync)
The sync client's job: upload only what changed. Algorithm:
- Local client splits file into 4 MB blocks; computes hashes.
- Sends hash list to server: "for file X, my blocks are h1, h2, h3, h4."
- Server compares against the file's last server-known manifest; identifies which blocks are new.
- Client uploads only new blocks.
- Server commits new manifest
(file_id, version, block_list)atomically.
For downloads, the inverse: client gets the new manifest, downloads only blocks it doesn't already have. The server never re-sends blocks the client cached locally.
This is the entire reason a 2 KB diff to a 10 MB file uploads in < 1 s.
Conflict Resolution
Two devices edit the same file offline; both come back online; both send updated manifests. Strategies:
- Last-writer-wins — the loser's edit becomes a renamed file:
foo.txt+foo (Alice's conflicted copy 2024-05-03).txt. Used by Dropbox. Deterministic, no data loss, manual merge. - Three-way merge — for text files, run a diff3 against the common ancestor. Auto-resolves non-overlapping changes, prompts for overlaps. Used by Google Docs, but Docs is a different beast (real-time CRDT, not file-sync).
- Block-level merge — if device A only changed blocks 1–3 and device B only changed blocks 4–7, the merged file is A's 1–3 + B's 4–7. Works for sparse changes; user expectations may differ.
For a generic file-sync service, last-writer-wins + conflicted-copy is the only safe default.
Version History
Every save is a new version of the manifest, not a destructive overwrite. Schema: file_versions(file_id, version, manifest, created_at, created_by). Keep all versions for free users for 30 days; paid users for 1 year or unlimited.
Garbage collection: when a block is referenced by no live version (across all users), it can be deleted from object storage. Reference-count via a periodic Spark job over the metadata DB; do not delete eagerly — one block can become unreachable in one user's tree but still reachable through dedup in another's.
Sharing Model
A shared folder is a mount point: the same underlying namespace appears in multiple users' trees. Implementation:
- Mount table per user:
(user_id, mount_path, namespace_id). - Namespace ID — a shared namespace is one namespace_id; each user's tree references the namespace at their chosen mount path.
- ACL per (namespace_id, user_id, role): owner, editor, commenter, viewer.
- Notify fan-out on a change in a shared namespace: enumerate the mount table, push notifications to every device session of every member.
The fan-out for a 1000-member shared folder during a heavy upload session is the expensive part. Batch notifications, debounce per device, and use long-polling rather than per-event push.
S3-like vs Custom Storage (Magic Pocket)
- S3 / GCS / Azure Blob — fast to start, 11 nines durability, unbeatable economics at small scale. Cost dominates as you grow past ~100 PB.
- Custom (Dropbox Magic Pocket, Backblaze) — build your own erasure-coded multi-DC store, save 50–75% on storage cost at the price of a multi-year engineering investment.
Dropbox famously moved 90%+ of their data off AWS to Magic Pocket between 2014–2016, saving ~$75M/year. The lesson: hyperscale-cloud storage is great until your usage is a meaningful percentage of the cloud's margin.
Failure Modes
- Notify lag — user saves on phone, laptop doesn't see for minutes. Long-poll with 30 s timeout; cursor-based catchup on reconnect.
- Manifest race — two devices commit manifest version N+1 simultaneously. Optimistic concurrency:
UPDATE file SET version = N+1 WHERE version = N; loser retries with conflicted-copy resolution. - Block GC bug — eager deletion of a still-referenced block corrupts files for the dedup-sharing user. Be paranoid: reference-count with grace period, soft-delete first.
- Storage hot spot — a viral shared file fans out to millions of clients downloading the same blocks. CDN-cache popular blocks at the edge.
FAQ
Encryption?
Server-side at rest by default. Client-side E2EE (Tresorit, Dropbox Vault) is harder: defeats server-side dedup and requires careful key management for sharing.
How big should blocks be?
4 MB is the typical sweet spot: small enough that a small edit only re-uploads one block; large enough that metadata overhead is negligible (50 B per block, vs 4 MB content).
Can two users dedupe identical content?
Yes — that's the dedup. Privacy implication: a server can detect that two users have the same file (the hash matches). Mitigations: convergent encryption (deterministic key from content) preserves dedup but reveals same-content; pure E2EE removes both.
How do you handle 100 GB files?
Multipart upload via HTTP, parallel block uploads, resumable on disconnect. Same chunking as smaller files; just more blocks per file.