Architecture

Source camera, mezzanine Transcoder FFmpeg, parallel H.264, AV1, HEVC Packager CMAF segments, HLS + DASH manifests DRM Widevine, FairPlay, PlayReady Origin S3 .ts, .m4s segments CDN edges PoPs, cache TTL, prefetch hot titles License server key issuance per authenticated session Player (ABR)

Capacity Estimation

MetricValueNotes
Concurrent streams (peak)~50 Mmajor-event scale
Per-stream bitrate3–15 MbpsHD–4K HEVC
Aggregate egress~250 Tbps50 M × 5 Mbps avg
VOD storage / title~100 GB10 ladders × 4K master
Catalog size~50 PB500 K titles × 100 GB
Live-stream latency30 s → 3 sHLS to LL-HLS
Transcode CPU~1× realtime/streamper ladder rung

Adaptive Bitrate: HLS and DASH

The video is encoded into a bitrate ladder: typically 6–10 rungs from 240p 0.4 Mbps to 4K 15 Mbps. Each rung is split into segments (2–6 s) and indexed by a manifest. The player observes its bandwidth and buffer health, then chooses which rung to fetch the next segment from.

  • HLS (Apple, .m3u8) — mandatory on iOS Safari; widely supported. Uses TS or fMP4 segments.
  • MPEG-DASH (.mpd) — ISO standard, more flexible (multiple period and adaptation set abstractions). Used by Netflix and YouTube.
  • CMAF — common segment format (fMP4) usable by both HLS and DASH manifests. Lets you store one set of segments and serve both protocols, halving CDN footprint.

ABR algorithms have evolved: throughput-based (BOLA), buffer-based, and now ML-based (Pensieve-style RL agents). Production players (hls.js, Shaka, AVPlayer) typically run a hybrid that prefers buffer level when stable and bandwidth estimate when starting up.

CDN Edge Caching

Video is fundamentally a CDN problem: 99% of the bytes you serve must come from edge PoPs, not your origin. Three patterns:

  • Pull-through — first request misses and pulls from origin; subsequent same-segment requests hit. Default for VOD; cache-key on (segment URL, byte range).
  • Pre-warm — for a Netflix premiere, push popular ladders to edges before launch. Removes the cold-start penalty for early viewers.
  • Multi-CDN — stream from Akamai + Cloudflare + Fastly, the player picks the best. Mitigates regional outages and gives leverage on pricing.

Cache TTL: VOD segments are immutable; cache forever (until purge on takedown). Manifests must be short-TTL'd (seconds for live, hour for VOD) so the player picks up new ladder rungs / availability windows.

Transcoding Pipeline

One source file becomes a fan-out of N rungs × M codecs. Production pipelines:

  • Chunked transcoding — split source into 30-s GOP-aligned chunks, dispatch to a worker pool, encode in parallel, concatenate. A 2-hour movie transcodes in 5–10 minutes on 50 workers; serial would take hours.
  • Per-title encoding (Netflix) — instead of a fixed ladder, run a calibration pass to pick optimal bitrates for this title's complexity. A talking-head doc gets a low bitrate for the same quality as an action movie at high.
  • FFmpeg + x264 / x265 / SVT-AV1 / libvpx — AV1 cuts bitrate ~30% vs H.264 at the cost of 5–10× encode CPU. Worth it for a long-tail VOD library; not yet for live.

Live vs VOD

The pipelines diverge:

  • VOD: ingest once, transcode at leisure, store in S3, manifests are static. Player can prebuffer aggressively. End-to-end latency is irrelevant.
  • Live: continuous ingest (RTMP, SRT), real-time transcode, segments published to CDN as they finalize, manifests updated every segment. End-to-end latency is the killer metric — 30 s for legacy HLS, 3–5 s for LL-HLS / LL-DASH.

Live also needs DVR (rewind), ad insertion, and origin failover — if the encoder crashes, you have minutes (not hours) to recover before the audience leaves.

Low-Latency HLS (LL-HLS)

Standard HLS has 6 s segments, 3-segment buffer = 18 s glass-to-glass. LL-HLS (Apple, 2019) introduces:

  • Partial segments (chunks of ~250 ms) published as they encode, so the player fetches sub-segment intervals.
  • Blocking playlist reload — manifest server holds the request until a new chunk is ready, eliminating poll-wait.
  • Preload hints — manifest tells the player which next chunk to prefetch.

Result: ~3 s latency. Still slower than WebRTC (sub-second) but with HLS's CDN economics. WebRTC is the right choice for video conferencing where < 200 ms matters and audience size is bounded.

DRM: Widevine, PlayReady, FairPlay

Premium content (movies, sports rights) requires hardware-rooted DRM:

  • Widevine — Google, mandatory on Android and Chrome. Three security levels: L1 (hardware-rooted, required for HD/4K), L2 (software with some hardware), L3 (pure software, SD only).
  • PlayReady — Microsoft, Edge, Xbox, smart TVs.
  • FairPlay — Apple, Safari + iOS + tvOS.

You must support all three to cover devices. Common Encryption (CENC) standardizes the bitstream so one encrypted file works with all three; only the key delivery differs. The license server issues short-lived keys per authenticated session, with policies (offline allowed, output protections, expiry).

The MPEG-DASH Manifest Model

The DASH .mpd is a hierarchy: PeriodAdaptationSetRepresentationSegmentTemplate. Period boundaries enable splicing (ad insertion, content stitching). AdaptationSet groups alternatives the player chooses among (different bitrates of the same video; different audio languages). Representation is one rung; SegmentTemplate gives the URL pattern.

This expressiveness is why Netflix uses DASH: server-side ad insertion (SSAI), regional content variants, and stream switching are all manifest tricks rather than separate pipelines.

Failure Modes

  • Origin shield miss — CDN miss storm during a live event hits origin with millions of RPS. Use tiered cache (regional shield) and pre-warm on schedule.
  • Encoder fall-behind — live transcoder can't keep realtime; latency creeps. Auto-fail to a lower-quality fallback ladder while ops investigates.
  • License server flood — major release; everyone hits the DRM endpoint at the same moment. Cache license metadata at edge; rate-limit per device.
  • ABR oscillation — player flips rungs every segment. Buffer-aware ABR with dwell time on each rung; punish frequent switches in the cost function.

FAQ

Why not WebRTC for everything?

WebRTC is sub-second but uses peer-to-peer or SFU topology that doesn't scale to millions of viewers economically. HLS/DASH ride the CDN cost curve.

How big should segments be?

VOD: 6 s (legacy HLS). Live: 2–4 s (LL-HLS, partial segments give sub-second granularity). Smaller = lower latency, more requests; larger = better cache efficiency.

Storing or computing thumbnails?

Pre-extract thumbnail strips at upload time (one image with all timestamps tiled). Players use byte ranges to show scrub-thumbnails without per-frame requests.

How do you bill bandwidth?

Most CDNs charge per GB egress with regional pricing. Multi-CDN routing optimizes cost: cheap CDN for cold regions, premium for premium markets. Cost dominates infrastructure budget.