Envoy Internals

Envoy is the L7 proxy that quietly carries most of the internet's east-west traffic. It was written at Lyft in 2016 to solve a single problem: every microservice in a large fleet was reimplementing the same connection pool, retry policy, timeout, and circuit breaker — badly. Envoy moves all of that out of application code and into a dedicated sidecar that speaks HTTP/1.1, HTTP/2, HTTP/3, gRPC, TCP, and a half-dozen other protocols. It is the data plane behind Istio, Consul Connect, AWS App Mesh, Google Cloud Service Mesh, and the gateways at Lyft, Stripe, Square, Pinterest, Reddit, and Apple.

Built on a single-threaded-per-worker event loop in C++14, Envoy can sustain hundreds of thousands of requests per second per core with sub-millisecond p99 added latency.

Envoy Architecture Overview

Downstream Client / browser LISTENER :443 TLS Inspector HTTP Conn Mgr JWT auth filter Router filter Filter chain CLUSTER MANAGER cluster: orders load_balancer: LR cluster: users circuit breaker cluster: payments outlier detection ENDPOINTS 10.0.1.5:8080 10.0.1.6:8080 10.0.1.7:8080 10.0.1.8:8080 10.0.1.9:8080 EJECTED 10.0.1.10:8080 EDS-discovered Control Plane xDS over gRPC LDS / RDS / CDS / EDS push config request flow: downstream → listener → filters → router → cluster → load balance → endpoint

Key Numbers

Threading
Worker per core
Default Concurrency
= CPU count
xDS Transport
gRPC streams
HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS
100 default
Initial Stream Window
256 KiB
Initial Conn Window
1 MiB
Outlier Eject Default
5 consec 5xx

Why Envoy Exists

The Problem
Every service at a large microservice shop reimplemented retries, timeouts, circuit breakers, mTLS, observability — usually in a per-language library that drifted between Java, Go, Python, and Node. Bugs lived in N codebases at once. Upgrades took quarters. A single misconfigured retry could DDoS your own backend.
The Move
Pull all that logic out of the application and into a sidecar process that lives next to every service. Application code makes a plain HTTP call to localhost; Envoy handles the wire, the policy, and the metrics. The application no longer cares whether the upstream is on the same host, in another zone, or behind mTLS.
The Payoff
A single proxy binary, dynamically reconfigured by a control plane via xDS, is the substrate for every modern service mesh. The same binary runs at the edge as an L7 load balancer, in the middle as an east-west proxy, and on the host as a sidecar — without restarts, without dropped connections.

Listeners, Filter Chains, and the Request Path

A listener binds a socket — typically a single port like :443 or :15001 for the iptables-redirected sidecar inbound. When a TCP connection arrives, Envoy walks the listener's filter chain matcher using SNI, ALPN, source IP, transport protocol, and destination port to pick the right chain. A filter chain is an ordered list of network filters: a TLS terminator, then typically the http_connection_manager, which itself owns a sub-chain of HTTP filters — JWT auth, RBAC, ext_authz, rate limit, fault injection, Lua, WASM, and finally the terminal router filter that selects a cluster and dispatches the request upstream.

static_resources:
  listeners:
  - name: ingress
    address: {'{'}socket_address: {'{'}address: 0.0.0.0, port_value: 443{'}'}{'}'}
    filter_chains:
    - filter_chain_match: {'{'}server_names: ["api.example.com"]{'}'}
      transport_socket:                       # TLS termination
        name: envoy.transport_sockets.tls
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          stat_prefix: ingress_http
          http_filters:
          - name: envoy.filters.http.jwt_authn       # verify JWT
          - name: envoy.filters.http.ext_authz       # call OPA / authz svc
          - name: envoy.filters.http.local_ratelimit # token bucket per route
          - name: envoy.filters.http.router          # MUST be last
          route_config:
            virtual_hosts:
            - domains: ["*"]
              routes:
              - match: {'{'}prefix: "/orders"{'}'}
                route: {'{'}cluster: orders, timeout: 2s, retry_policy: {'{'}retry_on: 5xx, num_retries: 2{'}'}{'}'}

Each filter sees a typed event stream — onNewConnection, onData, decodeHeaders, decodeData, decodeTrailers, encodeHeaders, encodeData, encodeTrailers — and can return StopIteration to pause the chain (e.g. while the JWT filter awaits a JWKS fetch) or Continue to fall through. A filter that fully replies (a 401 from JWT auth) short-circuits everything below it; the router never runs and no upstream connection is opened.

Listeners reload without dropping connections. When LDS pushes a new configuration, Envoy creates the new listener, hot-restarts the bound socket via SO_REUSEPORT on Linux, drains the old listener for the configured grace period (default 10 minutes), and only then closes idle connections. Live requests on the old listener finish on the old config — the in-flight HTTP/2 stream does not see a configuration change mid-flight.

Clusters, Endpoints, and Load Balancing

A cluster is the upstream side of the proxy — a logical group of hosts that serve a service, plus the policy for talking to them. Each cluster owns:

Endpoints carry weights and priorities. Priority 0 is preferred; priority 1 is used only when priority 0 has too few healthy hosts (controlled by overprovisioning_factor, default 1.4). This is how Envoy implements zone-aware routing: same-zone endpoints get priority 0, cross-zone get priority 1, and traffic spills cross-zone only when the local zone's healthy fraction drops below the overprovisioning threshold.

xDS: The Control Plane Protocol

Envoy is a pure data plane. It does not watch Kubernetes, query Consul, or talk to your service registry. Instead it speaks xDS — a family of gRPC streaming APIs — to a control plane (Istio Pilot, contour, go-control-plane, java-control-plane, AWS App Mesh controller, your own implementation). The major xDS services:

APIResourceWhat it carries
LDSListenerSockets to bind, filter chains, TLS contexts
RDSRouteConfigurationVirtual hosts, route matches, retry/timeout, cluster targets
CDSClusterUpstreams, LB policy, circuit breakers, outlier detection
EDSClusterLoadAssignmentThe actual endpoints (IP:port, weight, locality, priority)
SDSSecretTLS certificates and keys, rotated without restart
ADS(aggregated)Single ordered stream multiplexing all of the above

The original wire mode is SOTW ("State of the World") — every push is a complete snapshot of all resources of that type. Convenient, simple, and quadratic on large fleets: a 50 KB cluster change forces re-shipping every other cluster too. The modern mode is incremental xDS: only changed resources are sent, with explicit removed_resources tombstones. Envoy ACKs each push by echoing the version it accepted, or NACKs with an error detail, which the control plane uses to roll back. ADS — Aggregated Discovery Service — sends LDS, RDS, CDS, and EDS over a single stream so that the order of arrival is deterministic and an EDS update never references a cluster that hasn't yet arrived.

Reload is in-memory and atomic per resource. A cluster update doesn't drop existing connections; it just creates a new cluster instance, swaps the pointer, and lets the old one drain. The same machinery powers Istio's "hot" config push at thousands of pods per second.

HTTP/2 Multiplexing and Connection Pooling

Envoy speaks HTTP/2 to upstreams by default whenever the cluster is configured with http2_protocol_options. A single TCP+TLS connection carries up to max_concurrent_streams (default 100) concurrent in-flight requests, framed as alternating HEADERS and DATA frames on numbered streams. Two scarce resources govern throughput:

The result: one Envoy worker can hold tens of thousands of concurrent streams across a few hundred upstream connections. This is why Envoy adds so little CPU per request — most of the cost of a request is amortized across the multiplexed connection rather than paid per-handshake.

For HTTP/1.1 upstreams Envoy falls back to a one-stream-per-connection pool. The pool is sized by max_connections, and idle connections are closed after idle_timeout (default 1 hour). Mixing HTTP/1 and HTTP/2 in the same cluster requires auto_http upstream protocol selection — Envoy ALPN-negotiates per connection.

Retries, Timeouts, and Hedging

Every retry policy in Envoy lives in two layers. The route-level retry policy controls when a request is retryable: retry_on: 5xx, retriable-4xx, connect-failure, refused-stream, cancelled, deadline-exceeded, resource-exhausted, or retriable-status-codes. num_retries caps the count; per_try_timeout caps each attempt; the route-level timeout caps the entire user-visible request including all retries.

Two important budget mechanisms prevent retry storms:

Envoy also supports request hedging for tail-latency reduction: send a second copy of the request after hedge_on_per_try_timeout elapses, take whichever response arrives first, cancel the loser. Useful for read traffic to replicated backends where a slow tail is the dominant latency contributor.

mTLS, SDS, and Identity

In a service mesh deployment, every Envoy is both a TLS server (for inbound traffic from peer Envoys) and a TLS client (for outbound traffic to peer Envoys). The certificate it presents identifies the workload — typically a SPIFFE URI like spiffe://cluster.local/ns/orders/sa/orders-sa. Peer Envoys validate that URI against an authorization policy ("orders-sa is allowed to call payments-sa on POST /charge"), implemented by the RBAC HTTP filter or by ext_authz to an external policy engine like OPA.

Certificates are short-lived (typically 24 hours, often less) and rotated via SDS — Secret Discovery Service. The control plane streams new certificates to Envoy over a dedicated gRPC stream without restarting the proxy and without dropping connections. This is one of the load-bearing properties of a production mesh: rotating a million certificates a day across a fleet is a routine background task, not an outage.

Observability: Stats, Tracing, Logging

Envoy is observable to a degree that is sometimes embarrassing — every cluster, every upstream host, every listener, every filter, every HTTP method emits its own counter, gauge, and histogram. The defaults expose roughly 200-400 stats per cluster and per listener; a moderately complex sidecar emits tens of thousands of distinct time series.

Extensibility: Lua, WASM, and ext_proc

Envoy supports four extensibility models, in roughly ascending order of decoupling:

The general guidance: ext_authz for authorization, WASM for header rewriting and edge logic, ext_proc when you need to mutate the body in a separate language, and C++ only for high-throughput edge cases like custom protocol parsers.

Envoy vs Other Proxies

EnvoyNGINXHAProxyLinkerd2-proxy
LanguageC++14CCRust
HTTP/2 upstreamNative, multiplexedAvailable, less battle-testedNativeNative
HTTP/3 / QUICYes (BoringSSL/quiche)ExperimentalYesNo
Hot reload modelxDS streaming, no restartSIGHUP + worker handoffSIGUSR2 + binary swapDestination API streaming
Config substrateProtobuf / xDSnginx.conf DSLcfg DSLdestination + policy CRDs
Service mesh useIstio, AWS App Mesh, Consul, KumaNGINX Service Mesh (deprecated)Linkerd
Best atL7 mesh data plane, edgeStatic + reverse proxy, filesPure load balancer, TCPLightweight mesh, Kubernetes-native
Memory per worker~30-100 MB baseline~5-20 MB~5-15 MB~10-20 MB

Tradeoffs and Honest Weaknesses

Frequently Asked Questions

Why is Envoy single-threaded per worker instead of using a thread pool?
Each worker runs an independent libevent loop with no shared state. Connections are pinned to a worker via SO_REUSEPORT (the kernel hashes the 4-tuple). This eliminates locks on the data path — the slow path (config update) goes through a thread-local-storage publish/subscribe pattern called TLS slots. The result is near-linear scaling with cores: doubling cores doubles throughput. A traditional thread-pool model with shared connection pools would hit lock contention on every request.
How does Envoy do graceful drain on a binary upgrade?
Hot restart uses a separate "hot restarter" parent process that exec's two Envoy children sharing a Unix domain socket. The new child binds the listener with SO_REUSEPORT (or via socket FD passing), the parent stops accepting on the old child, and the old child drains over parent_shutdown_time (default 15 minutes for sidecars). Because connections are pinned to a worker, in-flight HTTP/2 streams complete on the old binary while new connections land on the new binary. Zero downtime, zero connection loss.
What's the difference between SOTW and incremental xDS?
SOTW (State of the World) — every push contains every resource of that type. Simple to implement, but bandwidth scales with fleet size: a 1-cluster change pushes 10,000 clusters in a 10k-cluster fleet. Incremental xDS sends only added/changed/removed resources, identified by name. Same wire format (DeltaDiscoveryRequest/DeltaDiscoveryResponse), different semantics. Production Istio defaults to delta xDS.
Does Envoy support gRPC natively, or just HTTP/2?
Both. gRPC rides on HTTP/2, so the connection-management primitives are shared. On top of that Envoy understands gRPC trailers, the grpc-status code, gRPC retry semantics (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE map to retry_on: gateway-error and friends), gRPC-specific health checks, gRPC-Web translation for browsers, and gRPC reflection proxying. The router filter has gRPC-aware code paths for header-based routing on :path = /svc.Service/Method.
How does outlier detection differ from active health checks?
Active health checks send a synthetic probe (HTTP, gRPC, TCP) on a schedule and mark a host unhealthy on N consecutive failures. They cost real connections per host per interval. Outlier detection is passive: it observes real traffic and ejects hosts that return consecutive_5xx, consecutive_gateway_failure, or whose success rate falls outside success_rate_stdev_factor standard deviations of the cluster mean. Outlier detection is free (it's just counting existing responses) but only catches symptoms that real traffic exercises — a host that 500s only on POST won't be ejected if you only send GETs.
Can I run Envoy as an edge gateway, or is it just a sidecar?
Both modes are first-class. Envoy at the edge is what powers projects like Contour, Emissary-ingress (Ambassador), and Tetrate Service Bridge. Edge config tends to use larger listener pools, more aggressive HTTP/3 enablement, integration with TLS certificate managers (cert-manager + SDS), and external rate-limiting services. Same binary; different control plane.
What happens to in-flight requests when the control plane goes down?
Envoy keeps serving with its last-known-good config indefinitely. The xDS connection reconnects with exponential back-off. New endpoints don't get learned, certificates eventually expire (since SDS pushes new ones), and you lose the ability to deploy new routes — but existing traffic keeps flowing. This is one of the key arguments for separating data and control planes: a Pilot crash should not be a request-path outage.