Design Spotify
Audio Chunking, Adaptive Bitrate Streaming, Recommendation Engine, and Playlist Management at Scale
A music streaming platform like Spotify serves 100M+ tracks to 600M+ users worldwide. The core challenges span audio encoding and chunking for low-latency streaming, adaptive bitrate selection that adjusts quality to network conditions, personalized recommendations via collaborative filtering and content-based models, playlist management at scale (4B+ playlists), offline sync with DRM-protected downloads, rights management across regions, and search across a catalog of 100M+ tracks with fuzzy matching on titles, artists, albums, and lyrics. The system must handle peak concurrent streams in the tens of millions while maintaining sub-second start times and gapless playback.
Audio Streaming Pipeline
When a user presses play, audio is served as a series of small chunks. Each chunk is fetched from the CDN, decoded, and fed to the audio buffer. The player prefetches chunks ahead of the playback position to ensure smooth, uninterrupted listening.
Adaptive Bitrate Calculator
The streaming client continuously monitors network conditions and adjusts audio quality to prevent buffering while maximizing fidelity. Higher bandwidth unlocks higher quality tiers.
Recommendation Engine Visualizer
Collaborative filtering finds users with similar taste and recommends songs they liked that you haven't heard yet. This demo shows a simplified user-song rating matrix and predicts missing ratings based on user similarity.
| Rock Ballad | Pop Hit | Jazz Solo | EDM Drop | Indie Folk | Hip Hop | Classical | R&B |
|---|
Click "Generate Recommendations" to see similarity calculations and predicted ratings.
Capacity Estimation
Estimate the infrastructure requirements for a Spotify-scale music streaming platform based on user base and listening patterns.
Playlist & Search Architecture
Key architectural decisions for managing billions of playlists, enabling fast search across 100M+ tracks, and supporting offline playback.
Playlist Storage
Denormalized (Fast Reads): Store full track metadata in each playlist entry. Read a playlist in a single query, but updates to track metadata (e.g., album art change) require fan-out writes to every playlist containing that track.
Normalized (Consistency): Playlist entries store only track IDs. Reads require a join or second lookup, but metadata updates are instant. Spotify uses a hybrid: denormalized for hot data with async consistency reconciliation.
Search Infrastructure
Elasticsearch for full-text search across track names, artist names, album titles, and lyrics. Supports fuzzy matching, autocomplete, and typo tolerance via n-gram tokenizers.
Inverted index for tag-based and genre-based queries. Combined with a popularity signal to rank results β searching "love" should surface popular hits first, not obscure tracks.
Offline Sync
Selective download: Users choose playlists/albums to cache locally. The app downloads tracks at the user's preferred quality setting, encrypted with device-bound DRM keys.
Delta sync: When a playlist changes, only sync added/removed tracks β don't re-download the entire playlist. Track download state in a local SQLite database.
Content Addressing
Audio files are stored in object storage (S3/GCS) with content-hash-based keys. Each track is pre-encoded at multiple bitrates (96, 160, 256, 320 kbps) in Ogg Vorbis format.
A CDN layer (Fastly/CloudFront) caches the top ~1% of tracks that account for ~80% of plays, achieving a 95%+ cache hit ratio for audio content.
High-Level Architecture
The system decomposes into specialized services connected through an API gateway, backed by purpose-built data stores.
Key Design Decisions
Four critical decisions that shape the architecture of a music streaming system.
Audio Storage Strategy
Pre-encode every track to multiple bitrates (96, 160, 256, 320 kbps) at ingest time. Store chunked audio files in object storage (S3/GCS) with content-hash keys. Distribute via CDN with long TTLs β audio files are immutable once encoded.
Trade-off: 4x storage cost per track, but eliminates real-time transcoding and enables instant quality switching without re-buffering.
Streaming Protocol
HTTP-based progressive download using byte-range requests rather than true streaming protocols (RTMP/WebRTC). The client requests audio in ~512KB chunks via standard HTTP GET with Range headers.
Benefits: works through firewalls/proxies, cacheable by CDNs, simple retry logic. The client maintains a 5-10 second buffer and prefetches the next chunk while playing the current one.
Recommendation Approach
Hybrid model: Collaborative filtering (users who listened to X also listened to Y) + content-based features (tempo, key, energy, danceability from audio analysis) + contextual signals (time of day, device type, recent listening history).
Spotify's Discover Weekly uses matrix factorization on the user-track interaction matrix combined with NLP on playlist titles to understand cultural context around tracks.
Rights Management
DRM (Widevine/FairPlay): Audio streams are encrypted. Playback requires a license key fetched from the DRM server, tied to the user's subscription status and device.
Regional licensing: Not all tracks are available everywhere. The catalog service checks licensing per region at play time. Royalty tracking: every stream > 30 seconds triggers a royalty event routed to the payment pipeline.