Video Streaming — HLS, DASH, transcoding, low-latency, DRM

Architecture

Capacity Estimation

Metric	Value	Notes
Concurrent streams (peak)	~50 M	major-event scale
Per-stream bitrate	3–15 Mbps	HD–4K HEVC
Aggregate egress	~250 Tbps	50 M × 5 Mbps avg
VOD storage / title	~100 GB	10 ladders × 4K master
Catalog size	~50 PB	500 K titles × 100 GB
Live-stream latency	30 s → 3 s	HLS to LL-HLS
Transcode CPU	~1× realtime/stream	per ladder rung

Adaptive Bitrate: HLS and DASH

The video is encoded into a bitrate ladder: typically 6–10 rungs from 240p 0.4 Mbps to 4K 15 Mbps. Each rung is split into segments (2–6 s) and indexed by a manifest. The player observes its bandwidth and buffer health, then chooses which rung to fetch the next segment from.

HLS (Apple, .m3u8) — mandatory on iOS Safari; widely supported. Uses TS or fMP4 segments.
MPEG-DASH (.mpd) — ISO standard, more flexible (multiple period and adaptation set abstractions). Used by Netflix and YouTube.
CMAF — common segment format (fMP4) usable by both HLS and DASH manifests. Lets you store one set of segments and serve both protocols, halving CDN footprint.

ABR algorithms have evolved: throughput-based (BOLA), buffer-based, and now ML-based (Pensieve-style RL agents). Production players (hls.js, Shaka, AVPlayer) typically run a hybrid that prefers buffer level when stable and bandwidth estimate when starting up.

CDN Edge Caching

Video is fundamentally a CDN problem: 99% of the bytes you serve must come from edge PoPs, not your origin. Three patterns:

Pull-through — first request misses and pulls from origin; subsequent same-segment requests hit. Default for VOD; cache-key on (segment URL, byte range).
Pre-warm — for a Netflix premiere, push popular ladders to edges before launch. Removes the cold-start penalty for early viewers.
Multi-CDN — stream from Akamai + Cloudflare + Fastly, the player picks the best. Mitigates regional outages and gives leverage on pricing.

Cache TTL: VOD segments are immutable; cache forever (until purge on takedown). Manifests must be short-TTL'd (seconds for live, hour for VOD) so the player picks up new ladder rungs / availability windows.

Transcoding Pipeline

One source file becomes a fan-out of N rungs × M codecs. Production pipelines:

Chunked transcoding — split source into 30-s GOP-aligned chunks, dispatch to a worker pool, encode in parallel, concatenate. A 2-hour movie transcodes in 5–10 minutes on 50 workers; serial would take hours.
Per-title encoding (Netflix) — instead of a fixed ladder, run a calibration pass to pick optimal bitrates for this title's complexity. A talking-head doc gets a low bitrate for the same quality as an action movie at high.
FFmpeg + x264 / x265 / SVT-AV1 / libvpx — AV1 cuts bitrate ~30% vs H.264 at the cost of 5–10× encode CPU. Worth it for a long-tail VOD library; not yet for live.

Live vs VOD

The pipelines diverge:

VOD: ingest once, transcode at leisure, store in S3, manifests are static. Player can prebuffer aggressively. End-to-end latency is irrelevant.
Live: continuous ingest (RTMP, SRT), real-time transcode, segments published to CDN as they finalize, manifests updated every segment. End-to-end latency is the killer metric — 30 s for legacy HLS, 3–5 s for LL-HLS / LL-DASH.

Live also needs DVR (rewind), ad insertion, and origin failover — if the encoder crashes, you have minutes (not hours) to recover before the audience leaves.

Low-Latency HLS (LL-HLS)

Standard HLS has 6 s segments, 3-segment buffer = 18 s glass-to-glass. LL-HLS (Apple, 2019) introduces:

Partial segments (chunks of ~250 ms) published as they encode, so the player fetches sub-segment intervals.
Blocking playlist reload — manifest server holds the request until a new chunk is ready, eliminating poll-wait.
Preload hints — manifest tells the player which next chunk to prefetch.

Result: ~3 s latency. Still slower than WebRTC (sub-second) but with HLS's CDN economics. WebRTC is the right choice for video conferencing where < 200 ms matters and audience size is bounded.

DRM: Widevine, PlayReady, FairPlay

Premium content (movies, sports rights) requires hardware-rooted DRM:

Widevine — Google, mandatory on Android and Chrome. Three security levels: L1 (hardware-rooted, required for HD/4K), L2 (software with some hardware), L3 (pure software, SD only).
PlayReady — Microsoft, Edge, Xbox, smart TVs.
FairPlay — Apple, Safari + iOS + tvOS.

You must support all three to cover devices. Common Encryption (CENC) standardizes the bitstream so one encrypted file works with all three; only the key delivery differs. The license server issues short-lived keys per authenticated session, with policies (offline allowed, output protections, expiry).

The MPEG-DASH Manifest Model

The DASH .mpd is a hierarchy: Period → AdaptationSet → Representation → SegmentTemplate. Period boundaries enable splicing (ad insertion, content stitching). AdaptationSet groups alternatives the player chooses among (different bitrates of the same video; different audio languages). Representation is one rung; SegmentTemplate gives the URL pattern.

This expressiveness is why Netflix uses DASH: server-side ad insertion (SSAI), regional content variants, and stream switching are all manifest tricks rather than separate pipelines.

Failure Modes

Origin shield miss — CDN miss storm during a live event hits origin with millions of RPS. Use tiered cache (regional shield) and pre-warm on schedule.
Encoder fall-behind — live transcoder can't keep realtime; latency creeps. Auto-fail to a lower-quality fallback ladder while ops investigates.
License server flood — major release; everyone hits the DRM endpoint at the same moment. Cache license metadata at edge; rate-limit per device.
ABR oscillation — player flips rungs every segment. Buffer-aware ABR with dwell time on each rung; punish frequent switches in the cost function.

FAQ

Why not WebRTC for everything?

WebRTC is sub-second but uses peer-to-peer or SFU topology that doesn't scale to millions of viewers economically. HLS/DASH ride the CDN cost curve.

How big should segments be?

VOD: 6 s (legacy HLS). Live: 2–4 s (LL-HLS, partial segments give sub-second granularity). Smaller = lower latency, more requests; larger = better cache efficiency.

Storing or computing thumbnails?

Pre-extract thumbnail strips at upload time (one image with all timestamps tiled). Players use byte ranges to show scrub-thumbnails without per-frame requests.

How do you bill bandwidth?

Most CDNs charge per GB egress with regional pricing. Multi-CDN routing optimizes cost: cheap CDN for cold regions, premium for premium markets. Cost dominates infrastructure budget.