Video Streaming
Streaming video at scale is a pipeline of five intersecting problems: encoding the source into multiple bitrate ladders, packaging into chunked formats (HLS, DASH), distributing via a global CDN, adapting per-viewer to their fluctuating bandwidth, and protecting premium content with DRM. The choices that matter most are the bitrate ladder design, the live-vs-VOD path split, and where DRM keys are issued.
Architecture
Capacity Estimation
| Metric | Value | Notes |
|---|---|---|
| Concurrent streams (peak) | ~50 M | major-event scale |
| Per-stream bitrate | 3–15 Mbps | HD–4K HEVC |
| Aggregate egress | ~250 Tbps | 50 M × 5 Mbps avg |
| VOD storage / title | ~100 GB | 10 ladders × 4K master |
| Catalog size | ~50 PB | 500 K titles × 100 GB |
| Live-stream latency | 30 s → 3 s | HLS to LL-HLS |
| Transcode CPU | ~1× realtime/stream | per ladder rung |
Adaptive Bitrate: HLS and DASH
The video is encoded into a bitrate ladder: typically 6–10 rungs from 240p 0.4 Mbps to 4K 15 Mbps. Each rung is split into segments (2–6 s) and indexed by a manifest. The player observes its bandwidth and buffer health, then chooses which rung to fetch the next segment from.
- HLS (Apple, .m3u8) — mandatory on iOS Safari; widely supported. Uses TS or fMP4 segments.
- MPEG-DASH (.mpd) — ISO standard, more flexible (multiple period and adaptation set abstractions). Used by Netflix and YouTube.
- CMAF — common segment format (fMP4) usable by both HLS and DASH manifests. Lets you store one set of segments and serve both protocols, halving CDN footprint.
ABR algorithms have evolved: throughput-based (BOLA), buffer-based, and now ML-based (Pensieve-style RL agents). Production players (hls.js, Shaka, AVPlayer) typically run a hybrid that prefers buffer level when stable and bandwidth estimate when starting up.
CDN Edge Caching
Video is fundamentally a CDN problem: 99% of the bytes you serve must come from edge PoPs, not your origin. Three patterns:
- Pull-through — first request misses and pulls from origin; subsequent same-segment requests hit. Default for VOD; cache-key on (segment URL, byte range).
- Pre-warm — for a Netflix premiere, push popular ladders to edges before launch. Removes the cold-start penalty for early viewers.
- Multi-CDN — stream from Akamai + Cloudflare + Fastly, the player picks the best. Mitigates regional outages and gives leverage on pricing.
Cache TTL: VOD segments are immutable; cache forever (until purge on takedown). Manifests must be short-TTL'd (seconds for live, hour for VOD) so the player picks up new ladder rungs / availability windows.
Transcoding Pipeline
One source file becomes a fan-out of N rungs × M codecs. Production pipelines:
- Chunked transcoding — split source into 30-s GOP-aligned chunks, dispatch to a worker pool, encode in parallel, concatenate. A 2-hour movie transcodes in 5–10 minutes on 50 workers; serial would take hours.
- Per-title encoding (Netflix) — instead of a fixed ladder, run a calibration pass to pick optimal bitrates for this title's complexity. A talking-head doc gets a low bitrate for the same quality as an action movie at high.
- FFmpeg + x264 / x265 / SVT-AV1 / libvpx — AV1 cuts bitrate ~30% vs H.264 at the cost of 5–10× encode CPU. Worth it for a long-tail VOD library; not yet for live.
Live vs VOD
The pipelines diverge:
- VOD: ingest once, transcode at leisure, store in S3, manifests are static. Player can prebuffer aggressively. End-to-end latency is irrelevant.
- Live: continuous ingest (RTMP, SRT), real-time transcode, segments published to CDN as they finalize, manifests updated every segment. End-to-end latency is the killer metric — 30 s for legacy HLS, 3–5 s for LL-HLS / LL-DASH.
Live also needs DVR (rewind), ad insertion, and origin failover — if the encoder crashes, you have minutes (not hours) to recover before the audience leaves.
Low-Latency HLS (LL-HLS)
Standard HLS has 6 s segments, 3-segment buffer = 18 s glass-to-glass. LL-HLS (Apple, 2019) introduces:
- Partial segments (chunks of ~250 ms) published as they encode, so the player fetches sub-segment intervals.
- Blocking playlist reload — manifest server holds the request until a new chunk is ready, eliminating poll-wait.
- Preload hints — manifest tells the player which next chunk to prefetch.
Result: ~3 s latency. Still slower than WebRTC (sub-second) but with HLS's CDN economics. WebRTC is the right choice for video conferencing where < 200 ms matters and audience size is bounded.
DRM: Widevine, PlayReady, FairPlay
Premium content (movies, sports rights) requires hardware-rooted DRM:
- Widevine — Google, mandatory on Android and Chrome. Three security levels: L1 (hardware-rooted, required for HD/4K), L2 (software with some hardware), L3 (pure software, SD only).
- PlayReady — Microsoft, Edge, Xbox, smart TVs.
- FairPlay — Apple, Safari + iOS + tvOS.
You must support all three to cover devices. Common Encryption (CENC) standardizes the bitstream so one encrypted file works with all three; only the key delivery differs. The license server issues short-lived keys per authenticated session, with policies (offline allowed, output protections, expiry).
The MPEG-DASH Manifest Model
The DASH .mpd is a hierarchy: Period → AdaptationSet → Representation → SegmentTemplate. Period boundaries enable splicing (ad insertion, content stitching). AdaptationSet groups alternatives the player chooses among (different bitrates of the same video; different audio languages). Representation is one rung; SegmentTemplate gives the URL pattern.
This expressiveness is why Netflix uses DASH: server-side ad insertion (SSAI), regional content variants, and stream switching are all manifest tricks rather than separate pipelines.
Failure Modes
- Origin shield miss — CDN miss storm during a live event hits origin with millions of RPS. Use tiered cache (regional shield) and pre-warm on schedule.
- Encoder fall-behind — live transcoder can't keep realtime; latency creeps. Auto-fail to a lower-quality fallback ladder while ops investigates.
- License server flood — major release; everyone hits the DRM endpoint at the same moment. Cache license metadata at edge; rate-limit per device.
- ABR oscillation — player flips rungs every segment. Buffer-aware ABR with dwell time on each rung; punish frequent switches in the cost function.
FAQ
Why not WebRTC for everything?
WebRTC is sub-second but uses peer-to-peer or SFU topology that doesn't scale to millions of viewers economically. HLS/DASH ride the CDN cost curve.
How big should segments be?
VOD: 6 s (legacy HLS). Live: 2–4 s (LL-HLS, partial segments give sub-second granularity). Smaller = lower latency, more requests; larger = better cache efficiency.
Storing or computing thumbnails?
Pre-extract thumbnail strips at upload time (one image with all timestamps tiled). Players use byte ranges to show scrub-thumbnails without per-frame requests.
How do you bill bandwidth?
Most CDNs charge per GB egress with regional pricing. Multi-CDN routing optimizes cost: cheap CDN for cold regions, premium for premium markets. Cost dominates infrastructure budget.