gRPC Internals

gRPC is the RPC framework that finally made HTTP/2 useful for backend services. It pairs Protocol Buffers — a tiny, schema-driven binary wire format — with HTTP/2's multiplexed streams, then layers on deadlines, cancellation, retries, name resolution, and load balancing. The whole thing is designed for what REST never solved well: high-throughput, low-latency, strongly-typed service-to-service communication, with first-class bidirectional streaming and language-agnostic code generation. It's open-sourced from Google's internal Stubby and now powers internal RPC at Netflix, Square, Dropbox, Lyft, Cisco, and roughly every Kubernetes control plane (kube-apiserver, etcd, containerd, CRI, CSI, CNI all speak gRPC).

The reference implementation is grpc-c-core (used by C++, Python, Ruby, PHP, C#, Objective-C), with separate native implementations in Go (grpc-go) and Java (grpc-java).

gRPC Architecture Overview

CLIENT Generated stub Interceptors (auth, log) Channel (state machine) Name resolver + LB HTTP/2 transport TLS / TCP HTTP/2 STREAM HEADERS · :method=POST · :path=/svc.Service/Method · te=trailers DATA · 5-byte prefix + protobuf message · END_STREAM=0 DATA · message N · END_STREAM=1 HEADERS (trailers) · grpc-status=0 · grpc-message= SERVER Generated handler Interceptors Service registry Worker pool HTTP/2 transport TLS / TCP one HTTP/2 stream per RPC · multiple concurrent streams per connection · binary protobuf messages framed in DATA frames

Key Numbers

Wire encoding
Protobuf (binary)
Transport
HTTP/2 only
Streaming modes
4
Message frame prefix
5 bytes
Max msg (default)
4 MiB
Status code domain
17 codes
Default keepalive ping
2 hours

Why gRPC Exists

REST Was Bloated
A typical JSON REST call carries 10x more bytes on the wire than the same data in protobuf, because every field name appears as a string in every payload. Backend-to-backend traffic at Google scale was wasting compute on JSON parsing and bandwidth on field names.
HTTP/1.1 Couldn't Multiplex
A single HTTP/1.1 connection serves one request at a time. Every additional concurrent call needed a new connection — and head-of-line blocking on the connection meant a slow response stalled everything behind it. HTTP/2's stream multiplexing was the unlock.
Schemas Force Discipline
A .proto file is the single source of truth for both client and server in any language. Field numbers, not names, identify fields on the wire — meaning you can rename freely, deprecate without breaking, and add fields in either direction without coordination. The compiler refuses to drop a required field type-mismatch.

Protocol Buffers: The Wire Format

A protobuf message on the wire is a sequence of (tag, value) pairs. The tag encodes the field number plus a 3-bit wire type:

tag = (field_number << 3) | wire_type
wire types: 0 = varint, 1 = 64-bit, 2 = length-delimited, 5 = 32-bit
// (3 and 4 were start/end group; deprecated in proto3)

A varint is the workhorse encoding: 7 bits of payload per byte, with the MSB set as a continuation marker. Values 0-127 take one byte. 128-16383 take two. The number 300 (binary 100101100) becomes AC 02 on the wire. Negative numbers using int32 are zero-extended to 64 bits before encoding, costing 10 bytes — which is why protobuf has a separate sint32 with zigzag encoding (n becomes 2n if positive, -2n-1 if negative) so small negative numbers stay small.

A length-delimited field (wire type 2) is a varint length followed by that many bytes of payload — used for string, bytes, embedded messages, and packed repeated scalars. Embedded messages aren't framed any differently from byte strings; the parser knows the type from the schema.

// proto3 schema
message User {'{'}
  int64 id = 1;
  string name = 2;
  repeated string emails = 3;
{'}'}

// User{'{'}id=300, name="Ada", emails=["a@x", "b@y"]{'}'} on the wire:
// 08 AC 02            field 1 (varint), value 300
// 12 03 41 64 61      field 2 (length=3), bytes "Ada"
// 1A 03 61 40 78      field 3 (length=3), bytes "a@x"
// 1A 03 62 40 79      field 3 (length=3), bytes "b@y"
//
// Total: 18 bytes. The same JSON is ~50 bytes.

A few wire-format consequences worth knowing:

HTTP/2 Framing: How an RPC Rides the Wire

Every gRPC call is exactly one HTTP/2 stream. The mapping is:

The "trailers" part is why gRPC requires HTTP/2: HTTP/1.1 trailers exist but are poorly supported, and you need a way to deliver a final status after the body — because streamed responses don't know they'll fail until late. The te: trailers header on the request is mandatory and signals that the client understands trailers.

One TCP+TLS connection can carry up to SETTINGS_MAX_CONCURRENT_STREAMS simultaneous calls (typically 100). Beyond that, the channel either queues new RPCs or opens a second connection, depending on the implementation. HTTP/2 flow control means the receiver controls how fast bytes flow on each stream and across the connection, independently of TCP-level backpressure.

The Four Streaming Modes

Defined by which side of the call sends a stream of messages vs a single message:

service Chat {'{'}
  rpc Send(Message) returns (Ack);                      // unary
  rpc Watch(Query) returns (stream Update);             // server streaming
  rpc Upload(stream Chunk) returns (UploadResult);      // client streaming
  rpc Sync(stream Event) returns (stream Event);        // bidirectional streaming
{'}'}

All four use the same on-the-wire framing: a HEADERS, then DATA frames, then trailing HEADERS. The mode is a code-generation distinction, not a protocol distinction.

Deadlines and Cancellation Propagation

gRPC has no timeout in the HTTP sense; it has deadlines. A deadline is an absolute wall-clock instant by which the call must complete. The client sets it (e.g., 2s from now), the runtime serializes it as the grpc-timeout header, and the server enforces it on its side. When the deadline expires, the runtime synthesizes a DEADLINE_EXCEEDED status and cancels the stream — both ways.

The critical property is propagation. If service A calls B with a 2s deadline, and B internally calls C, B must forward the remaining deadline to C. Most language SDKs make this automatic: the request context carries the deadline, and a derived context for any sub-call inherits the remaining time. This is how an entire fan-out RPC tree dies cleanly when the user-visible deadline expires — instead of orphaning hundreds of background calls that nobody is waiting for.

Cancellation is the deadline's companion. A client that closes the call (user closed the browser tab, parent context cancelled) sends a RST_STREAM frame with error code CANCEL. The server's handler context fires immediately; the handler is expected to release resources and return. Long-running server work that doesn't honor cancellation is the most common gRPC anti-pattern — it leaks goroutines / threads / file descriptors on every cancelled call.

Channel State, Name Resolution, Load Balancing

A gRPC channel is the client-side abstraction over (potentially many) connections to (potentially many) backends. It runs a state machine:

Behind the channel is the name resolver. Schemes like dns:///orders.svc.cluster.local:50051, xds:///orders, unix:///var/run/svc.sock, or custom schemes registered by the application. The resolver returns one or more endpoints (addresses and per-endpoint configuration) and a service config — JSON describing load balancing policy, retry policy, method-level overrides.

The load balancer picks one endpoint per RPC. The standard policies:

Service config is the JSON glue. A typical entry:

{'{'}
  "loadBalancingConfig": [{'{'}"round_robin": {'{}'}{'}'}],
  "methodConfig": [{'{'}
    "name": [{'{'}"service": "orders.OrderService"{'}'}],
    "timeout": "2s",
    "retryPolicy": {'{'}
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "1s",
      "backoffMultiplier": 2,
      "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
    {'}'}
  {'}'}]
{'}'}

Retries, Hedging, and gRFC A6

gRPC retry policy is governed by gRFC A6, the canonical gRPC design document. Retries are off by default — there is no implicit retry the way HTTP libraries often retry on connection refused. You opt in via service config or programmatically.

Once enabled, the client retries on configured status codes (UNAVAILABLE is the only safe default — it specifically means "no work was done"; the server didn't see your request). For other codes, retrying may cause duplicate side effects, so the server is expected to make its handlers idempotent or expose an idempotency key in the request.

Two important gates:

Hedging is the alternative for tail-latency reduction: send up to N copies of the same request (initial + N-1 hedges) staggered by hedgingDelay; take the first response; cancel the rest. Hedging is mutually exclusive with retries on the same call. It's particularly useful for read calls where one slow replica would otherwise stall the whole RPC.

Interceptors: Cross-Cutting Concerns

Interceptors are gRPC's middleware mechanism. A unary interceptor wraps a handler: it receives the incoming request and the next handler, and can do any of: log, authenticate, inject metadata, mutate context, time the call, short-circuit. Streaming interceptors wrap the stream object so they can observe each message.

Standard patterns:

Interceptors compose like functional middleware: each wraps the next, building a chain from outside in. Order matters — auth before rate limit, rate limit before logging (so you log the rejection too), logging before the handler.

Keepalive and Connection Health

A gRPC connection is persistent; an idle connection still costs file descriptors and kernel state, and intermediate NATs/load balancers may silently drop a TCP connection after some inactivity timeout (typically 60-300s on cloud LBs). gRPC keepalive sends periodic HTTP/2 PING frames to detect dead connections and to keep middleboxes from reaping the connection.

Knobs that matter:

An aggressive client + permissive server gives sub-second connection-down detection and automatic reconnect via the channel state machine. The server's GOAWAY frame is the graceful equivalent: "finish your in-flight streams up to ID X, then this connection is closing" — used during deploys and load shedding.

gRPC-Web and Browser Limitations

Browsers cannot speak HTTP/2 frames directly from JavaScript — the Fetch API and XMLHttpRequest don't expose stream-level control or trailers. gRPC-Web is the workaround: a slightly different wire format that lets browsers call gRPC services via a proxy.

Two gRPC-Web modes:

Both modes are translated to native gRPC by a server-side proxy (Envoy's grpc_web filter is the canonical implementation). gRPC-Web supports unary and server-streaming RPCs. Client streaming and bidirectional streaming are not supported in any browser today — a fundamental limitation of Fetch's request body semantics.

gRPC vs Other RPC Styles

gRPCREST + JSONGraphQLApache Thrift
Wire formatProtobuf (binary)JSON (text)JSON over HTTPBinary (Compact / Binary protocol)
TransportHTTP/2 onlyHTTP/1.1, /2, /3HTTP, WebSocket subscriptionsTCP, HTTP, custom
Schema.proto, code-generatedOpenAPI optionalSDL, code-generated typed clients.thrift, code-generated
Streaming4 modes built-inSSE, long-poll, WebSocket bolt-onsSubscriptions (WebSocket)Limited
BrowsergRPC-Web with proxyNativeNativeLimited
Best atInternal microservices, polyglotPublic APIs, simple integrationsFront-end driven aggregationFacebook-era polyglot (legacy)
Schema evolutionStrong; field numbersConventionalStrong but explicitStrong; field IDs
Toolingprotoc, buf, grpcurl, ghz, evanscurl, Postman, OpenAPI generatorsGraphiQL, Apollothrift compiler

Tradeoffs and Honest Weaknesses

Frequently Asked Questions

Why does HTTP status always return 200 even for errors?
Because gRPC does not map errors to HTTP status codes. The transport layer (HTTP/2) succeeds even when the call logically fails. The actual outcome is in the grpc-status trailer: 0 = OK, 1 = CANCELLED, 2 = UNKNOWN, ..., 14 = UNAVAILABLE, 16 = UNAUTHENTICATED. This separation is necessary because streamed responses produce bytes successfully right up until the moment they fail; you cannot send the HTTP status header twice. Trailers are how gRPC reports outcome after the body.
Should I use unary calls or server streaming for fetching a list?
Unary if the result fits in one message comfortably (sub-megabyte, low-thousands of items). Server streaming when the list is large enough that materializing it on the server is expensive, or when results arrive incrementally (search, log tail). Streaming has higher per-message overhead due to framing and backpressure machinery; for small results it's slower than unary.
What's the difference between Channel and ClientConn (in Go) and Stub?
The Channel/ClientConn is the connection-management layer — it owns the underlying HTTP/2 connections, name resolver, load balancer, and channel state machine. A single channel is meant to be created once per (target, credential) pair and shared across the application. The Stub is the type-safe wrapper generated from the .proto: it offers method calls like orderClient.GetOrder(ctx, req) and serializes them onto the channel. Stubs are cheap; channels are expensive.
What does grpc-encoding do, and should I enable gzip?
grpc-encoding in headers selects a per-message compression codec (gzip, deflate, snappy, identity). Each DATA frame's 5-byte prefix has a compression-flag byte that's set when the message is compressed. Default is identity (no compression). Enable gzip for verbose payloads (large lists, long strings) where the CPU cost is paid back by network savings. For small chatty RPCs, gzip is net-negative. Modern setups often skip per-message compression and rely on TLS-layer compression being disabled (CRIME) plus large enough messages to warrant the codec.
How do I version a gRPC API without breaking existing clients?
Two layers. Within a service: never reuse a field number, never change a field's type, mark deprecated fields as [deprecated = true], add new fields with new numbers. Across major versions: bump the package name (orders.v1orders.v2) so old and new live side by side. Servers can implement both, clients pick one. The HTTP/2 path naturally encodes the version: /orders.v1.OrderService/Get vs /orders.v2.OrderService/Get route to different handlers.
Why does my gRPC server occasionally return UNAVAILABLE on the first request after idle?
A connection that's been idle for the keepalive_time and was reaped by an intermediate NAT or load balancer. The TCP RST happens silently; the next RPC observes the broken pipe and surfaces UNAVAILABLE. The fix is either to lower keepalive_time below the middlebox idle threshold (set permit_without_stream=true and keepalive_time=30s or so) or rely on retry policy with UNAVAILABLE as a retryable code, which automatically reconnects.
Is xDS for gRPC the same xDS Envoy uses?
Yes — that's the whole point. gRPC's xDS resolver speaks the same LDS / RDS / CDS / EDS protocol as Envoy, against the same control plane (Istio, Cilium Service Mesh, Google Cloud Service Mesh). This means a gRPC client without a sidecar can do everything Envoy can: weighted clusters, locality-aware routing, traffic shifting, percentage canaries, fault injection. It's "proxyless service mesh" — gRPC clients become first-class mesh participants, eliminating the sidecar's RAM cost and one network hop per request.