Envoy Internals
Envoy is the L7 proxy that quietly carries most of the internet's east-west traffic. It was written at Lyft in 2016 to solve a single problem: every microservice in a large fleet was reimplementing the same connection pool, retry policy, timeout, and circuit breaker — badly. Envoy moves all of that out of application code and into a dedicated sidecar that speaks HTTP/1.1, HTTP/2, HTTP/3, gRPC, TCP, and a half-dozen other protocols. It is the data plane behind Istio, Consul Connect, AWS App Mesh, Google Cloud Service Mesh, and the gateways at Lyft, Stripe, Square, Pinterest, Reddit, and Apple.
Built on a single-threaded-per-worker event loop in C++14, Envoy can sustain hundreds of thousands of requests per second per core with sub-millisecond p99 added latency.
Envoy Architecture Overview
Key Numbers
Why Envoy Exists
Listeners, Filter Chains, and the Request Path
A listener binds a socket — typically a single port like :443
or :15001 for the iptables-redirected sidecar inbound. When a TCP connection
arrives, Envoy walks the listener's filter chain matcher using SNI, ALPN,
source IP, transport protocol, and destination port to pick the right chain. A filter chain
is an ordered list of network filters: a TLS terminator, then typically
the http_connection_manager, which itself owns a sub-chain of HTTP
filters — JWT auth, RBAC, ext_authz, rate limit, fault injection, Lua, WASM,
and finally the terminal router filter that selects a cluster and dispatches the request
upstream.
static_resources:
listeners:
- name: ingress
address: {'{'}socket_address: {'{'}address: 0.0.0.0, port_value: 443{'}'}{'}'}
filter_chains:
- filter_chain_match: {'{'}server_names: ["api.example.com"]{'}'}
transport_socket: # TLS termination
name: envoy.transport_sockets.tls
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
stat_prefix: ingress_http
http_filters:
- name: envoy.filters.http.jwt_authn # verify JWT
- name: envoy.filters.http.ext_authz # call OPA / authz svc
- name: envoy.filters.http.local_ratelimit # token bucket per route
- name: envoy.filters.http.router # MUST be last
route_config:
virtual_hosts:
- domains: ["*"]
routes:
- match: {'{'}prefix: "/orders"{'}'}
route: {'{'}cluster: orders, timeout: 2s, retry_policy: {'{'}retry_on: 5xx, num_retries: 2{'}'}{'}'}
Each filter sees a typed event stream — onNewConnection, onData,
decodeHeaders, decodeData, decodeTrailers,
encodeHeaders, encodeData, encodeTrailers — and
can return StopIteration to pause the chain (e.g. while the JWT filter awaits
a JWKS fetch) or Continue to fall through. A filter that fully replies (a 401
from JWT auth) short-circuits everything below it; the router never runs and no upstream
connection is opened.
Listeners reload without dropping connections. When LDS pushes a new
configuration, Envoy creates the new listener, hot-restarts the bound socket via
SO_REUSEPORT on Linux, drains the old listener for the configured grace
period (default 10 minutes), and only then closes idle connections. Live requests on
the old listener finish on the old config — the in-flight HTTP/2 stream does not see
a configuration change mid-flight.
Clusters, Endpoints, and Load Balancing
A cluster is the upstream side of the proxy — a logical group of hosts that serve a service, plus the policy for talking to them. Each cluster owns:
- Endpoints — the actual IP:port pairs, learned statically, via DNS, or via EDS (the streaming xDS endpoint discovery service).
- Load balancer —
ROUND_ROBIN,LEAST_REQUEST(P2C — pick two random, take the one with fewer outstanding requests),RING_HASH(consistent hashing for stickiness),MAGLEV(Google's faster consistent hash), orRANDOM. - Connection pool — per worker thread, per upstream host, separately for HTTP/1, HTTP/2, and HTTP/3. Envoy multiplexes by default: a single HTTP/2 connection carries up to 100 concurrent streams.
- Circuit breakers — hard caps on
max_connections,max_pending_requests,max_requests(in-flight),max_retries, andmax_connection_pools. Crossing a threshold triggers an immediate 503 with bodyupstream_max_pending_requests— no queueing, no waiting. - Outlier detection — passive health checking. After N consecutive 5xx (default 5) or N consecutive gateway failures (default 5), the host is ejected for a base time (default 30s) that doubles each subsequent ejection up to a cap.
- Active health checks — periodic HTTP, gRPC, or TCP probes that mark hosts unhealthy independently of live traffic.
Endpoints carry weights and priorities. Priority 0 is preferred; priority 1 is used only
when priority 0 has too few healthy hosts (controlled by overprovisioning_factor,
default 1.4). This is how Envoy implements zone-aware routing: same-zone endpoints get
priority 0, cross-zone get priority 1, and traffic spills cross-zone only when the local
zone's healthy fraction drops below the overprovisioning threshold.
xDS: The Control Plane Protocol
Envoy is a pure data plane. It does not watch Kubernetes, query Consul, or talk to your service registry. Instead it speaks xDS — a family of gRPC streaming APIs — to a control plane (Istio Pilot, contour, go-control-plane, java-control-plane, AWS App Mesh controller, your own implementation). The major xDS services:
| API | Resource | What it carries |
|---|---|---|
| LDS | Listener | Sockets to bind, filter chains, TLS contexts |
| RDS | RouteConfiguration | Virtual hosts, route matches, retry/timeout, cluster targets |
| CDS | Cluster | Upstreams, LB policy, circuit breakers, outlier detection |
| EDS | ClusterLoadAssignment | The actual endpoints (IP:port, weight, locality, priority) |
| SDS | Secret | TLS certificates and keys, rotated without restart |
| ADS | (aggregated) | Single ordered stream multiplexing all of the above |
The original wire mode is SOTW ("State of the World") — every push is a
complete snapshot of all resources of that type. Convenient, simple, and quadratic on
large fleets: a 50 KB cluster change forces re-shipping every other cluster too. The
modern mode is incremental xDS: only changed resources are sent, with
explicit removed_resources tombstones. Envoy ACKs each push by echoing the
version it accepted, or NACKs with an error detail, which the control plane uses to roll
back. ADS — Aggregated Discovery Service — sends LDS, RDS, CDS, and EDS over a single
stream so that the order of arrival is deterministic and an EDS update never references
a cluster that hasn't yet arrived.
Reload is in-memory and atomic per resource. A cluster update doesn't drop existing connections; it just creates a new cluster instance, swaps the pointer, and lets the old one drain. The same machinery powers Istio's "hot" config push at thousands of pods per second.
HTTP/2 Multiplexing and Connection Pooling
Envoy speaks HTTP/2 to upstreams by default whenever the cluster is configured with
http2_protocol_options. A single TCP+TLS connection carries up to
max_concurrent_streams (default 100) concurrent in-flight requests, framed
as alternating HEADERS and DATA frames on numbered streams. Two scarce resources govern
throughput:
- Stream concurrency — once the upstream's SETTINGS_MAX_CONCURRENT_STREAMS is reached, new requests queue inside Envoy until either an in-flight stream completes or another connection is opened. The connection pool will open additional connections up to
max_connections. - Flow control windows — HTTP/2 has both a per-stream window and a per-connection window, each starting at the SETTINGS_INITIAL_WINDOW_SIZE (default 64 KiB; Envoy uses 256 KiB per stream and 1 MiB per connection). A slow consumer that doesn't drain its window will stall the producer at the L7 layer, separately from TCP backpressure.
The result: one Envoy worker can hold tens of thousands of concurrent streams across a few hundred upstream connections. This is why Envoy adds so little CPU per request — most of the cost of a request is amortized across the multiplexed connection rather than paid per-handshake.
For HTTP/1.1 upstreams Envoy falls back to a one-stream-per-connection pool. The pool
is sized by max_connections, and idle connections are closed after
idle_timeout (default 1 hour). Mixing HTTP/1 and HTTP/2 in the same cluster
requires auto_http upstream protocol selection — Envoy ALPN-negotiates per
connection.
Retries, Timeouts, and Hedging
Every retry policy in Envoy lives in two layers. The route-level retry policy
controls when a request is retryable: retry_on: 5xx, retriable-4xx,
connect-failure, refused-stream, cancelled,
deadline-exceeded, resource-exhausted, or
retriable-status-codes. num_retries caps the count; per_try_timeout
caps each attempt; the route-level timeout caps the entire user-visible
request including all retries.
Two important budget mechanisms prevent retry storms:
- Retry budget — a configurable percentage cap (default 20%) on the ratio of retries to active requests for a cluster. When the cluster is unhealthy and every request retries, the budget caps total fan-out and prevents the classic "retry amplification" failure mode where a backend slowdown becomes a backend outage.
- Retry back-off — exponential with jitter, base 25ms, max 250ms by default. The control plane can also emit a
x-envoy-ratelimitedheader to make Envoy back off based on upstream load.
Envoy also supports request hedging for tail-latency reduction: send a
second copy of the request after hedge_on_per_try_timeout elapses, take
whichever response arrives first, cancel the loser. Useful for read traffic to replicated
backends where a slow tail is the dominant latency contributor.
mTLS, SDS, and Identity
In a service mesh deployment, every Envoy is both a TLS server (for inbound traffic from
peer Envoys) and a TLS client (for outbound traffic to peer Envoys). The certificate it
presents identifies the workload — typically a SPIFFE URI like
spiffe://cluster.local/ns/orders/sa/orders-sa. Peer Envoys validate that
URI against an authorization policy ("orders-sa is allowed to call payments-sa on
POST /charge"), implemented by the RBAC HTTP filter or by ext_authz to an
external policy engine like OPA.
Certificates are short-lived (typically 24 hours, often less) and rotated via SDS — Secret Discovery Service. The control plane streams new certificates to Envoy over a dedicated gRPC stream without restarting the proxy and without dropping connections. This is one of the load-bearing properties of a production mesh: rotating a million certificates a day across a fleet is a routine background task, not an outage.
Observability: Stats, Tracing, Logging
Envoy is observable to a degree that is sometimes embarrassing — every cluster, every upstream host, every listener, every filter, every HTTP method emits its own counter, gauge, and histogram. The defaults expose roughly 200-400 stats per cluster and per listener; a moderately complex sidecar emits tens of thousands of distinct time series.
- Stats sinks — statsd UDP/TCP, dog_statsd (with tags), Prometheus scrape on the admin port (
/stats/prometheus), Hystrix-style streaming, OpenTelemetry metrics. - Tracing — Envoy generates spans for every request, propagates B3, W3C Trace Context, and OpenTelemetry headers, and exports to Zipkin, Jaeger, Datadog, Lightstep, X-Ray, and OTLP. Sampling is per-listener with rate, override headers, and the standard
x-b3-sampledcontract. - Access logs — file, gRPC streaming (ALS), or stderr, with a format string supporting 100+ command operators (
%REQ(:METHOD)%,%RESPONSE_CODE%,%DURATION%,%UPSTREAM_HOST%, etc.) and structured JSON output. - Admin interface —
/clusters,/listeners,/config_dump,/runtime,/server_info,/stats, plus the live-modify/runtime_modifyfor feature flagging.
Extensibility: Lua, WASM, and ext_proc
Envoy supports four extensibility models, in roughly ascending order of decoupling:
- Built-in C++ filter — fastest, but requires forking and rebuilding Envoy. Used by all the in-tree filters.
- Lua filter — embed a small Lua script per route or per listener, can mutate headers, body, perform out-of-band HTTP calls. Microsecond overhead, but limited to the Lua sandbox API.
- WASM filter — compile Rust, Go (TinyGo), AssemblyScript, or C++ to WebAssembly and load via xDS. Sandboxed, hot-loaded, language-agnostic. Roughly 10-30 microseconds of overhead per filter invocation depending on host call frequency.
- ext_proc / ext_authz — call out over gRPC to a separate process. Fully decoupled, any language, but adds a network hop (typically ~500 microseconds within the same pod).
The general guidance: ext_authz for authorization, WASM for header rewriting and edge logic, ext_proc when you need to mutate the body in a separate language, and C++ only for high-throughput edge cases like custom protocol parsers.
Envoy vs Other Proxies
| Envoy | NGINX | HAProxy | Linkerd2-proxy | |
|---|---|---|---|---|
| Language | C++14 | C | C | Rust |
| HTTP/2 upstream | Native, multiplexed | Available, less battle-tested | Native | Native |
| HTTP/3 / QUIC | Yes (BoringSSL/quiche) | Experimental | Yes | No |
| Hot reload model | xDS streaming, no restart | SIGHUP + worker handoff | SIGUSR2 + binary swap | Destination API streaming |
| Config substrate | Protobuf / xDS | nginx.conf DSL | cfg DSL | destination + policy CRDs |
| Service mesh use | Istio, AWS App Mesh, Consul, Kuma | NGINX Service Mesh (deprecated) | — | Linkerd |
| Best at | L7 mesh data plane, edge | Static + reverse proxy, files | Pure load balancer, TCP | Lightweight mesh, Kubernetes-native |
| Memory per worker | ~30-100 MB baseline | ~5-20 MB | ~5-15 MB | ~10-20 MB |
Tradeoffs and Honest Weaknesses
- Memory footprint — a sidecar Envoy idles at 30-100 MB per pod. In a 100k-pod fleet that's 3-10 TB of RAM spent purely on the data plane. Lighter proxies like linkerd2-proxy exist precisely because of this.
- Configuration complexity — the protobuf schema for one listener with TLS + JWT + RBAC + rate limit + tracing is hundreds of lines. Most teams generate it from a higher-level abstraction (Istio VirtualService, Gateway API, custom CRDs). Hand-editing Envoy config in production is a code smell.
- Cold-start cost — a fresh Envoy needs xDS to push every listener, route, cluster, and endpoint before it can serve traffic. On a large mesh that can be 10-30 seconds. Pre-warming and config caching matter.
- Per-thread state — connection pools and load balancer state are per-worker, not shared. A cluster with 1000 healthy backends and 16 worker threads will, in steady state, hold 16 separate P2C choices — slightly suboptimal load distribution at low concurrency.
- WASM is still maturing — V8-based WASM filters add allocator pressure, and the Proxy-WASM ABI has churned. Many production deployments still prefer Lua or ext_authz for stability.
- Debugging deep filter chains — when a request fails on the wire, tracing exactly which filter returned which response code requires liberal access logging or the trace-level admin debugger. Envoy's flame graphs are not yet first-class.
Frequently Asked Questions
Why is Envoy single-threaded per worker instead of using a thread pool?
How does Envoy do graceful drain on a binary upgrade?
parent_shutdown_time (default 15 minutes for sidecars). Because connections are pinned to a worker, in-flight HTTP/2 streams complete on the old binary while new connections land on the new binary. Zero downtime, zero connection loss.What's the difference between SOTW and incremental xDS?
DeltaDiscoveryRequest/DeltaDiscoveryResponse), different semantics. Production Istio defaults to delta xDS.Does Envoy support gRPC natively, or just HTTP/2?
grpc-status code, gRPC retry semantics (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE map to retry_on: gateway-error and friends), gRPC-specific health checks, gRPC-Web translation for browsers, and gRPC reflection proxying. The router filter has gRPC-aware code paths for header-based routing on :path = /svc.Service/Method.How does outlier detection differ from active health checks?
consecutive_5xx, consecutive_gateway_failure, or whose success rate falls outside success_rate_stdev_factor standard deviations of the cluster mean. Outlier detection is free (it's just counting existing responses) but only catches symptoms that real traffic exercises — a host that 500s only on POST won't be ejected if you only send GETs.