Envoy Internals

Envoy is the L7 proxy that quietly carries most of the internet's east-west traffic. It was written at Lyft in 2016 to solve a single problem: every microservice in a large fleet was reimplementing the same connection pool, retry policy, timeout, and circuit breaker — badly. Envoy moves all of that out of application code and into a dedicated sidecar that speaks HTTP/1.1, HTTP/2, HTTP/3, gRPC, TCP, and a half-dozen other protocols. It is the data plane behind Istio, Consul Connect, AWS App Mesh, Google Cloud Service Mesh, and the gateways at Lyft, Stripe, Square, Pinterest, Reddit, and Apple.

Built on a single-threaded-per-worker event loop in C++14, Envoy can sustain hundreds of thousands of requests per second per core with sub-millisecond p99 added latency.

Envoy Architecture Overview

Key Numbers

Threading

Worker per core

Default Concurrency

= CPU count

xDS Transport

gRPC streams

HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS

100 default

Initial Stream Window

256 KiB

Initial Conn Window

1 MiB

Outlier Eject Default

5 consec 5xx

Why Envoy Exists

The Problem

Every service at a large microservice shop reimplemented retries, timeouts, circuit breakers, mTLS, observability — usually in a per-language library that drifted between Java, Go, Python, and Node. Bugs lived in N codebases at once. Upgrades took quarters. A single misconfigured retry could DDoS your own backend.

The Move

Pull all that logic out of the application and into a sidecar process that lives next to every service. Application code makes a plain HTTP call to localhost; Envoy handles the wire, the policy, and the metrics. The application no longer cares whether the upstream is on the same host, in another zone, or behind mTLS.

The Payoff

A single proxy binary, dynamically reconfigured by a control plane via xDS, is the substrate for every modern service mesh. The same binary runs at the edge as an L7 load balancer, in the middle as an east-west proxy, and on the host as a sidecar — without restarts, without dropped connections.

Listeners, Filter Chains, and the Request Path

A listener binds a socket — typically a single port like :443 or :15001 for the iptables-redirected sidecar inbound. When a TCP connection arrives, Envoy walks the listener's filter chain matcher using SNI, ALPN, source IP, transport protocol, and destination port to pick the right chain. A filter chain is an ordered list of network filters: a TLS terminator, then typically the http_connection_manager, which itself owns a sub-chain of HTTP filters — JWT auth, RBAC, ext_authz, rate limit, fault injection, Lua, WASM, and finally the terminal router filter that selects a cluster and dispatches the request upstream.

static_resources:
  listeners:
  - name: ingress
    address: {'{'}socket_address: {'{'}address: 0.0.0.0, port_value: 443{'}'}{'}'}
    filter_chains:
    - filter_chain_match: {'{'}server_names: ["api.example.com"]{'}'}
      transport_socket:                       # TLS termination
        name: envoy.transport_sockets.tls
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          stat_prefix: ingress_http
          http_filters:
          - name: envoy.filters.http.jwt_authn       # verify JWT
          - name: envoy.filters.http.ext_authz       # call OPA / authz svc
          - name: envoy.filters.http.local_ratelimit # token bucket per route
          - name: envoy.filters.http.router          # MUST be last
          route_config:
            virtual_hosts:
            - domains: ["*"]
              routes:
              - match: {'{'}prefix: "/orders"{'}'}
                route: {'{'}cluster: orders, timeout: 2s, retry_policy: {'{'}retry_on: 5xx, num_retries: 2{'}'}{'}'}

Each filter sees a typed event stream — onNewConnection, onData, decodeHeaders, decodeData, decodeTrailers, encodeHeaders, encodeData, encodeTrailers — and can return StopIteration to pause the chain (e.g. while the JWT filter awaits a JWKS fetch) or Continue to fall through. A filter that fully replies (a 401 from JWT auth) short-circuits everything below it; the router never runs and no upstream connection is opened.

Listeners reload without dropping connections. When LDS pushes a new configuration, Envoy creates the new listener, hot-restarts the bound socket via SO_REUSEPORT on Linux, drains the old listener for the configured grace period (default 10 minutes), and only then closes idle connections. Live requests on the old listener finish on the old config — the in-flight HTTP/2 stream does not see a configuration change mid-flight.

Clusters, Endpoints, and Load Balancing

A cluster is the upstream side of the proxy — a logical group of hosts that serve a service, plus the policy for talking to them. Each cluster owns:

Endpoints — the actual IP:port pairs, learned statically, via DNS, or via EDS (the streaming xDS endpoint discovery service).
Load balancer — ROUND_ROBIN, LEAST_REQUEST (P2C — pick two random, take the one with fewer outstanding requests), RING_HASH (consistent hashing for stickiness), MAGLEV (Google's faster consistent hash), or RANDOM.
Connection pool — per worker thread, per upstream host, separately for HTTP/1, HTTP/2, and HTTP/3. Envoy multiplexes by default: a single HTTP/2 connection carries up to 100 concurrent streams.
Circuit breakers — hard caps on max_connections, max_pending_requests, max_requests (in-flight), max_retries, and max_connection_pools. Crossing a threshold triggers an immediate 503 with body upstream_max_pending_requests — no queueing, no waiting.
Outlier detection — passive health checking. After N consecutive 5xx (default 5) or N consecutive gateway failures (default 5), the host is ejected for a base time (default 30s) that doubles each subsequent ejection up to a cap.
Active health checks — periodic HTTP, gRPC, or TCP probes that mark hosts unhealthy independently of live traffic.

Endpoints carry weights and priorities. Priority 0 is preferred; priority 1 is used only when priority 0 has too few healthy hosts (controlled by overprovisioning_factor, default 1.4). This is how Envoy implements zone-aware routing: same-zone endpoints get priority 0, cross-zone get priority 1, and traffic spills cross-zone only when the local zone's healthy fraction drops below the overprovisioning threshold.

xDS: The Control Plane Protocol

Envoy is a pure data plane. It does not watch Kubernetes, query Consul, or talk to your service registry. Instead it speaks xDS — a family of gRPC streaming APIs — to a control plane (Istio Pilot, contour, go-control-plane, java-control-plane, AWS App Mesh controller, your own implementation). The major xDS services:

API	Resource	What it carries
LDS	Listener	Sockets to bind, filter chains, TLS contexts
RDS	RouteConfiguration	Virtual hosts, route matches, retry/timeout, cluster targets
CDS	Cluster	Upstreams, LB policy, circuit breakers, outlier detection
EDS	ClusterLoadAssignment	The actual endpoints (IP:port, weight, locality, priority)
SDS	Secret	TLS certificates and keys, rotated without restart
ADS	(aggregated)	Single ordered stream multiplexing all of the above

The original wire mode is SOTW ("State of the World") — every push is a complete snapshot of all resources of that type. Convenient, simple, and quadratic on large fleets: a 50 KB cluster change forces re-shipping every other cluster too. The modern mode is incremental xDS: only changed resources are sent, with explicit removed_resources tombstones. Envoy ACKs each push by echoing the version it accepted, or NACKs with an error detail, which the control plane uses to roll back. ADS — Aggregated Discovery Service — sends LDS, RDS, CDS, and EDS over a single stream so that the order of arrival is deterministic and an EDS update never references a cluster that hasn't yet arrived.

Reload is in-memory and atomic per resource. A cluster update doesn't drop existing connections; it just creates a new cluster instance, swaps the pointer, and lets the old one drain. The same machinery powers Istio's "hot" config push at thousands of pods per second.

HTTP/2 Multiplexing and Connection Pooling

Envoy speaks HTTP/2 to upstreams by default whenever the cluster is configured with http2_protocol_options. A single TCP+TLS connection carries up to max_concurrent_streams (default 100) concurrent in-flight requests, framed as alternating HEADERS and DATA frames on numbered streams. Two scarce resources govern throughput:

Stream concurrency — once the upstream's SETTINGS_MAX_CONCURRENT_STREAMS is reached, new requests queue inside Envoy until either an in-flight stream completes or another connection is opened. The connection pool will open additional connections up to max_connections.
Flow control windows — HTTP/2 has both a per-stream window and a per-connection window, each starting at the SETTINGS_INITIAL_WINDOW_SIZE (default 64 KiB; Envoy uses 256 KiB per stream and 1 MiB per connection). A slow consumer that doesn't drain its window will stall the producer at the L7 layer, separately from TCP backpressure.

The result: one Envoy worker can hold tens of thousands of concurrent streams across a few hundred upstream connections. This is why Envoy adds so little CPU per request — most of the cost of a request is amortized across the multiplexed connection rather than paid per-handshake.

For HTTP/1.1 upstreams Envoy falls back to a one-stream-per-connection pool. The pool is sized by max_connections, and idle connections are closed after idle_timeout (default 1 hour). Mixing HTTP/1 and HTTP/2 in the same cluster requires auto_http upstream protocol selection — Envoy ALPN-negotiates per connection.

Retries, Timeouts, and Hedging

Every retry policy in Envoy lives in two layers. The route-level retry policy controls when a request is retryable: retry_on: 5xx, retriable-4xx, connect-failure, refused-stream, cancelled, deadline-exceeded, resource-exhausted, or retriable-status-codes. num_retries caps the count; per_try_timeout caps each attempt; the route-level timeout caps the entire user-visible request including all retries.

Two important budget mechanisms prevent retry storms:

Retry budget — a configurable percentage cap (default 20%) on the ratio of retries to active requests for a cluster. When the cluster is unhealthy and every request retries, the budget caps total fan-out and prevents the classic "retry amplification" failure mode where a backend slowdown becomes a backend outage.
Retry back-off — exponential with jitter, base 25ms, max 250ms by default. The control plane can also emit a x-envoy-ratelimited header to make Envoy back off based on upstream load.

Envoy also supports request hedging for tail-latency reduction: send a second copy of the request after hedge_on_per_try_timeout elapses, take whichever response arrives first, cancel the loser. Useful for read traffic to replicated backends where a slow tail is the dominant latency contributor.

mTLS, SDS, and Identity

In a service mesh deployment, every Envoy is both a TLS server (for inbound traffic from peer Envoys) and a TLS client (for outbound traffic to peer Envoys). The certificate it presents identifies the workload — typically a SPIFFE URI like spiffe://cluster.local/ns/orders/sa/orders-sa. Peer Envoys validate that URI against an authorization policy ("orders-sa is allowed to call payments-sa on POST /charge"), implemented by the RBAC HTTP filter or by ext_authz to an external policy engine like OPA.

Certificates are short-lived (typically 24 hours, often less) and rotated via SDS — Secret Discovery Service. The control plane streams new certificates to Envoy over a dedicated gRPC stream without restarting the proxy and without dropping connections. This is one of the load-bearing properties of a production mesh: rotating a million certificates a day across a fleet is a routine background task, not an outage.

Observability: Stats, Tracing, Logging

Envoy is observable to a degree that is sometimes embarrassing — every cluster, every upstream host, every listener, every filter, every HTTP method emits its own counter, gauge, and histogram. The defaults expose roughly 200-400 stats per cluster and per listener; a moderately complex sidecar emits tens of thousands of distinct time series.

Stats sinks — statsd UDP/TCP, dog_statsd (with tags), Prometheus scrape on the admin port (/stats/prometheus), Hystrix-style streaming, OpenTelemetry metrics.
Tracing — Envoy generates spans for every request, propagates B3, W3C Trace Context, and OpenTelemetry headers, and exports to Zipkin, Jaeger, Datadog, Lightstep, X-Ray, and OTLP. Sampling is per-listener with rate, override headers, and the standard x-b3-sampled contract.
Access logs — file, gRPC streaming (ALS), or stderr, with a format string supporting 100+ command operators (%REQ(:METHOD)%, %RESPONSE_CODE%, %DURATION%, %UPSTREAM_HOST%, etc.) and structured JSON output.
Admin interface — /clusters, /listeners, /config_dump, /runtime, /server_info, /stats, plus the live-modify /runtime_modify for feature flagging.

Extensibility: Lua, WASM, and ext_proc

Envoy supports four extensibility models, in roughly ascending order of decoupling:

Built-in C++ filter — fastest, but requires forking and rebuilding Envoy. Used by all the in-tree filters.
Lua filter — embed a small Lua script per route or per listener, can mutate headers, body, perform out-of-band HTTP calls. Microsecond overhead, but limited to the Lua sandbox API.
WASM filter — compile Rust, Go (TinyGo), AssemblyScript, or C++ to WebAssembly and load via xDS. Sandboxed, hot-loaded, language-agnostic. Roughly 10-30 microseconds of overhead per filter invocation depending on host call frequency.
ext_proc / ext_authz — call out over gRPC to a separate process. Fully decoupled, any language, but adds a network hop (typically ~500 microseconds within the same pod).

The general guidance: ext_authz for authorization, WASM for header rewriting and edge logic, ext_proc when you need to mutate the body in a separate language, and C++ only for high-throughput edge cases like custom protocol parsers.

Envoy vs Other Proxies

	Envoy	NGINX	HAProxy	Linkerd2-proxy
Language	C++14	C	C	Rust
HTTP/2 upstream	Native, multiplexed	Available, less battle-tested	Native	Native
HTTP/3 / QUIC	Yes (BoringSSL/quiche)	Experimental	Yes	No
Hot reload model	xDS streaming, no restart	SIGHUP + worker handoff	SIGUSR2 + binary swap	Destination API streaming
Config substrate	Protobuf / xDS	nginx.conf DSL	cfg DSL	destination + policy CRDs
Service mesh use	Istio, AWS App Mesh, Consul, Kuma	NGINX Service Mesh (deprecated)	—	Linkerd
Best at	L7 mesh data plane, edge	Static + reverse proxy, files	Pure load balancer, TCP	Lightweight mesh, Kubernetes-native
Memory per worker	~30-100 MB baseline	~5-20 MB	~5-15 MB	~10-20 MB

Tradeoffs and Honest Weaknesses

Memory footprint — a sidecar Envoy idles at 30-100 MB per pod. In a 100k-pod fleet that's 3-10 TB of RAM spent purely on the data plane. Lighter proxies like linkerd2-proxy exist precisely because of this.
Configuration complexity — the protobuf schema for one listener with TLS + JWT + RBAC + rate limit + tracing is hundreds of lines. Most teams generate it from a higher-level abstraction (Istio VirtualService, Gateway API, custom CRDs). Hand-editing Envoy config in production is a code smell.
Cold-start cost — a fresh Envoy needs xDS to push every listener, route, cluster, and endpoint before it can serve traffic. On a large mesh that can be 10-30 seconds. Pre-warming and config caching matter.
Per-thread state — connection pools and load balancer state are per-worker, not shared. A cluster with 1000 healthy backends and 16 worker threads will, in steady state, hold 16 separate P2C choices — slightly suboptimal load distribution at low concurrency.
WASM is still maturing — V8-based WASM filters add allocator pressure, and the Proxy-WASM ABI has churned. Many production deployments still prefer Lua or ext_authz for stability.
Debugging deep filter chains — when a request fails on the wire, tracing exactly which filter returned which response code requires liberal access logging or the trace-level admin debugger. Envoy's flame graphs are not yet first-class.

Frequently Asked Questions

Why is Envoy single-threaded per worker instead of using a thread pool?

Each worker runs an independent libevent loop with no shared state. Connections are pinned to a worker via SO_REUSEPORT (the kernel hashes the 4-tuple). This eliminates locks on the data path — the slow path (config update) goes through a thread-local-storage publish/subscribe pattern called TLS slots. The result is near-linear scaling with cores: doubling cores doubles throughput. A traditional thread-pool model with shared connection pools would hit lock contention on every request.

How does Envoy do graceful drain on a binary upgrade?

Hot restart uses a separate "hot restarter" parent process that exec's two Envoy children sharing a Unix domain socket. The new child binds the listener with SO_REUSEPORT (or via socket FD passing), the parent stops accepting on the old child, and the old child drains over parent_shutdown_time (default 15 minutes for sidecars). Because connections are pinned to a worker, in-flight HTTP/2 streams complete on the old binary while new connections land on the new binary. Zero downtime, zero connection loss.

What's the difference between SOTW and incremental xDS?

SOTW (State of the World) — every push contains every resource of that type. Simple to implement, but bandwidth scales with fleet size: a 1-cluster change pushes 10,000 clusters in a 10k-cluster fleet. Incremental xDS sends only added/changed/removed resources, identified by name. Same wire format (DeltaDiscoveryRequest/DeltaDiscoveryResponse), different semantics. Production Istio defaults to delta xDS.

Does Envoy support gRPC natively, or just HTTP/2?

Both. gRPC rides on HTTP/2, so the connection-management primitives are shared. On top of that Envoy understands gRPC trailers, the grpc-status code, gRPC retry semantics (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE map to retry_on: gateway-error and friends), gRPC-specific health checks, gRPC-Web translation for browsers, and gRPC reflection proxying. The router filter has gRPC-aware code paths for header-based routing on :path = /svc.Service/Method.

How does outlier detection differ from active health checks?

Active health checks send a synthetic probe (HTTP, gRPC, TCP) on a schedule and mark a host unhealthy on N consecutive failures. They cost real connections per host per interval. Outlier detection is passive: it observes real traffic and ejects hosts that return consecutive_5xx, consecutive_gateway_failure, or whose success rate falls outside success_rate_stdev_factor standard deviations of the cluster mean. Outlier detection is free (it's just counting existing responses) but only catches symptoms that real traffic exercises — a host that 500s only on POST won't be ejected if you only send GETs.

Can I run Envoy as an edge gateway, or is it just a sidecar?

Both modes are first-class. Envoy at the edge is what powers projects like Contour, Emissary-ingress (Ambassador), and Tetrate Service Bridge. Edge config tends to use larger listener pools, more aggressive HTTP/3 enablement, integration with TLS certificate managers (cert-manager + SDS), and external rate-limiting services. Same binary; different control plane.

What happens to in-flight requests when the control plane goes down?

Envoy keeps serving with its last-known-good config indefinitely. The xDS connection reconnects with exponential back-off. New endpoints don't get learned, certificates eventually expire (since SDS pushes new ones), and you lose the ability to deploy new routes — but existing traffic keeps flowing. This is one of the key arguments for separating data and control planes: a Pilot crash should not be a request-path outage.