gRPC Internals
gRPC is the RPC framework that finally made HTTP/2 useful for backend services. It pairs Protocol Buffers — a tiny, schema-driven binary wire format — with HTTP/2's multiplexed streams, then layers on deadlines, cancellation, retries, name resolution, and load balancing. The whole thing is designed for what REST never solved well: high-throughput, low-latency, strongly-typed service-to-service communication, with first-class bidirectional streaming and language-agnostic code generation. It's open-sourced from Google's internal Stubby and now powers internal RPC at Netflix, Square, Dropbox, Lyft, Cisco, and roughly every Kubernetes control plane (kube-apiserver, etcd, containerd, CRI, CSI, CNI all speak gRPC).
The reference implementation is grpc-c-core (used by C++, Python, Ruby, PHP, C#, Objective-C), with separate native implementations in Go (grpc-go) and Java (grpc-java).
gRPC Architecture Overview
Key Numbers
Why gRPC Exists
.proto file is the single source of truth for both client and server in any language. Field numbers, not names, identify fields on the wire — meaning you can rename freely, deprecate without breaking, and add fields in either direction without coordination. The compiler refuses to drop a required field type-mismatch.Protocol Buffers: The Wire Format
A protobuf message on the wire is a sequence of (tag, value) pairs. The tag encodes the field number plus a 3-bit wire type:
tag = (field_number << 3) | wire_type
wire types: 0 = varint, 1 = 64-bit, 2 = length-delimited, 5 = 32-bit
// (3 and 4 were start/end group; deprecated in proto3)
A varint is the workhorse encoding: 7 bits of payload per byte, with the
MSB set as a continuation marker. Values 0-127 take one byte. 128-16383 take two. The
number 300 (binary 100101100) becomes AC 02 on the wire. Negative numbers
using int32 are zero-extended to 64 bits before encoding, costing 10 bytes
— which is why protobuf has a separate sint32 with zigzag encoding (n becomes
2n if positive, -2n-1 if negative) so small negative numbers stay small.
A length-delimited field (wire type 2) is a varint length followed by
that many bytes of payload — used for string, bytes, embedded
messages, and packed repeated scalars. Embedded messages aren't framed any differently
from byte strings; the parser knows the type from the schema.
// proto3 schema
message User {'{'}
int64 id = 1;
string name = 2;
repeated string emails = 3;
{'}'}
// User{'{'}id=300, name="Ada", emails=["a@x", "b@y"]{'}'} on the wire:
// 08 AC 02 field 1 (varint), value 300
// 12 03 41 64 61 field 2 (length=3), bytes "Ada"
// 1A 03 61 40 78 field 3 (length=3), bytes "a@x"
// 1A 03 62 40 79 field 3 (length=3), bytes "b@y"
//
// Total: 18 bytes. The same JSON is ~50 bytes. A few wire-format consequences worth knowing:
- Field numbers are forever. Reuse a number for a different type and the bytes silently misinterpret. The compiler enforces non-reuse only within a single .proto edit; across versions it's the operator's job.
- Unknown fields are preserved in proto3 (since 3.5). A new field added by the server flows through old clients untouched and back to a new server, enabling middle-out schema evolution.
- Default values aren't transmitted. A proto3 int that is 0 emits zero bytes. This means "field absent" and "field set to default" are indistinguishable on the wire — use
optional(re-added in proto3) when presence matters. - Repeated scalar fields default to packed in proto3: one length prefix followed by the concatenated values. A repeated int with a million elements is one tag, one length, then the varints. Massive savings.
HTTP/2 Framing: How an RPC Rides the Wire
Every gRPC call is exactly one HTTP/2 stream. The mapping is:
- Initial HEADERS frame (request) —
:method=POST,:scheme=https,:path=/package.Service/Method,:authority=host:port,content-type=application/grpc+proto,te=trailers, optionalgrpc-timeout=5S, optionalgrpc-encoding=gzip, plus user metadata (custom headers). - One or more DATA frames — the request body. Each gRPC message in the body is prefixed with a 5-byte header: 1 byte compression flag (0 or 1) plus 4 bytes big-endian message length. Multiple messages can be packed into one DATA frame, or a single message can be split across multiple DATA frames.
- END_STREAM flag on the last DATA frame — closes the request half of the bidirectional stream.
- HEADERS frame (response) —
:status=200,content-type. (Note: HTTP status 200 even on logical errors. The actual gRPC status comes in the trailers.) - DATA frames with the response messages, same 5-byte prefix.
- Trailing HEADERS frame —
grpc-status(the integer status code, 0 = OK), optionalgrpc-message(UTF-8 description), optionalgrpc-status-details-bin(base64-encodedgoogle.rpc.Statuswith structured details).
The "trailers" part is why gRPC requires HTTP/2: HTTP/1.1 trailers exist but are poorly
supported, and you need a way to deliver a final status after the body — because
streamed responses don't know they'll fail until late. The te: trailers
header on the request is mandatory and signals that the client understands trailers.
One TCP+TLS connection can carry up to SETTINGS_MAX_CONCURRENT_STREAMS
simultaneous calls (typically 100). Beyond that, the channel either queues new RPCs or
opens a second connection, depending on the implementation. HTTP/2 flow control means
the receiver controls how fast bytes flow on each stream and across the connection,
independently of TCP-level backpressure.
The Four Streaming Modes
Defined by which side of the call sends a stream of messages vs a single message:
service Chat {'{'}
rpc Send(Message) returns (Ack); // unary
rpc Watch(Query) returns (stream Update); // server streaming
rpc Upload(stream Chunk) returns (UploadResult); // client streaming
rpc Sync(stream Event) returns (stream Event); // bidirectional streaming
{'}'} - Unary — the REST analog. Client sends one message, server replies with one message. END_STREAM on both sides immediately.
- Server streaming — client sends one request, server sends N responses on the same stream and closes. Used for change feeds, progress updates, log tails. Cancellation propagates: if the client closes its half, the server's
context.Done()fires. - Client streaming — client streams N messages, then closes its half; server replies with one final message. Used for chunked uploads, batch insertion, accumulating state.
- Bidirectional streaming — both sides send independent streams of messages on the same logical channel. Order between the two streams is not preserved at the protocol level — only ordering within each direction. Used for chat-like protocols, two-way subscriptions, and synchronization.
All four use the same on-the-wire framing: a HEADERS, then DATA frames, then trailing HEADERS. The mode is a code-generation distinction, not a protocol distinction.
Deadlines and Cancellation Propagation
gRPC has no timeout in the HTTP sense; it has deadlines. A
deadline is an absolute wall-clock instant by which the call must complete. The client
sets it (e.g., 2s from now), the runtime serializes it as the
grpc-timeout header, and the server enforces it on its side. When the
deadline expires, the runtime synthesizes a DEADLINE_EXCEEDED status and
cancels the stream — both ways.
The critical property is propagation. If service A calls B with a 2s deadline, and B internally calls C, B must forward the remaining deadline to C. Most language SDKs make this automatic: the request context carries the deadline, and a derived context for any sub-call inherits the remaining time. This is how an entire fan-out RPC tree dies cleanly when the user-visible deadline expires — instead of orphaning hundreds of background calls that nobody is waiting for.
Cancellation is the deadline's companion. A client that closes the call (user closed
the browser tab, parent context cancelled) sends a RST_STREAM frame with
error code CANCEL. The server's handler context fires immediately; the
handler is expected to release resources and return. Long-running server work that
doesn't honor cancellation is the most common gRPC anti-pattern — it leaks goroutines
/ threads / file descriptors on every cancelled call.
Channel State, Name Resolution, Load Balancing
A gRPC channel is the client-side abstraction over (potentially many) connections to (potentially many) backends. It runs a state machine:
- IDLE — channel exists but no active connection. New RPCs trigger a transition to CONNECTING.
- CONNECTING — opening a TCP+TLS connection.
- READY — at least one connection is up; RPCs flow.
- TRANSIENT_FAILURE — last attempt failed; back off and retry.
- SHUTDOWN — channel closed by the application; reject new RPCs.
Behind the channel is the name resolver. Schemes like dns:///orders.svc.cluster.local:50051,
xds:///orders, unix:///var/run/svc.sock, or custom schemes
registered by the application. The resolver returns one or more endpoints (addresses
and per-endpoint configuration) and a service config — JSON describing
load balancing policy, retry policy, method-level overrides.
The load balancer picks one endpoint per RPC. The standard policies:
- pick_first — connect to the resolver's first address; reuse it for all calls. Default for a single-target dial. Falls over to next on failure.
- round_robin — open a connection to every endpoint, rotate calls across them. Each backend sees roughly equal load.
- xds — full Envoy-style discovery + LB policy from a control plane. Supports weighted clusters, locality awareness, percentage-based traffic splitting (canary, A/B), priority-based failover, and active+passive health checking.
- grpclb (legacy) — talk to a separate "look-aside" LB service that returns a list of backends; deprecated in favor of xDS.
Service config is the JSON glue. A typical entry:
{'{'}
"loadBalancingConfig": [{'{'}"round_robin": {'{}'}{'}'}],
"methodConfig": [{'{'}
"name": [{'{'}"service": "orders.OrderService"{'}'}],
"timeout": "2s",
"retryPolicy": {'{'}
"maxAttempts": 4,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
{'}'}
{'}'}]
{'}'} Retries, Hedging, and gRFC A6
gRPC retry policy is governed by gRFC A6, the canonical gRPC design document. Retries are off by default — there is no implicit retry the way HTTP libraries often retry on connection refused. You opt in via service config or programmatically.
Once enabled, the client retries on configured status codes (UNAVAILABLE
is the only safe default — it specifically means "no work was done"; the server didn't
see your request). For other codes, retrying may cause duplicate side effects, so the
server is expected to make its handlers idempotent or expose an idempotency key in the
request.
Two important gates:
- Per-call retry budget — capped to
maxAttempts(max 5). - Server retry throttling — the server can return
grpc-retry-pushback-msto tell the client to back off for that many milliseconds, plus configure a token-bucket throttler that suppresses retries entirely when the bucket is empty (i.e. the server is throttling itself).
Hedging is the alternative for tail-latency reduction: send up to N
copies of the same request (initial + N-1 hedges) staggered by hedgingDelay;
take the first response; cancel the rest. Hedging is mutually exclusive with retries on
the same call. It's particularly useful for read calls where one slow replica would
otherwise stall the whole RPC.
Interceptors: Cross-Cutting Concerns
Interceptors are gRPC's middleware mechanism. A unary interceptor wraps a handler: it receives the incoming request and the next handler, and can do any of: log, authenticate, inject metadata, mutate context, time the call, short-circuit. Streaming interceptors wrap the stream object so they can observe each message.
Standard patterns:
- Auth interceptor — extract bearer token from
authorizationmetadata, verify it, attach claims to context. Reject withUNAUTHENTICATEDon failure. - Logging / tracing interceptor — start a span, record method name and duration, propagate trace context (W3C traceparent or B3 headers in metadata).
- Rate-limit interceptor — token-bucket per method or per caller.
- Validation interceptor — invoke protoc-gen-validate-generated validators on the request before the handler runs.
- Recovery interceptor (server-side) — convert panics/uncaught exceptions into
INTERNALstatus, prevent the worker from dying.
Interceptors compose like functional middleware: each wraps the next, building a chain from outside in. Order matters — auth before rate limit, rate limit before logging (so you log the rejection too), logging before the handler.
Keepalive and Connection Health
A gRPC connection is persistent; an idle connection still costs file descriptors and kernel state, and intermediate NATs/load balancers may silently drop a TCP connection after some inactivity timeout (typically 60-300s on cloud LBs). gRPC keepalive sends periodic HTTP/2 PING frames to detect dead connections and to keep middleboxes from reaping the connection.
Knobs that matter:
- keepalive_time — interval between PINGs (default 2 hours, often tuned down to 30-60s).
- keepalive_timeout — how long to wait for a PONG before considering the connection dead (default 20s).
- permit_without_stream — whether to PING when no RPCs are active. Without this, idle connections are reaped by middleboxes; with this, you keep them warm at the cost of a tiny background heartbeat.
- min_ping_interval_without_stream (server-side) — defends against clients that PING aggressively. Default 5 minutes; clients pinging faster get GOAWAY'd.
An aggressive client + permissive server gives sub-second connection-down detection and automatic reconnect via the channel state machine. The server's GOAWAY frame is the graceful equivalent: "finish your in-flight streams up to ID X, then this connection is closing" — used during deploys and load shedding.
gRPC-Web and Browser Limitations
Browsers cannot speak HTTP/2 frames directly from JavaScript — the Fetch API and XMLHttpRequest don't expose stream-level control or trailers. gRPC-Web is the workaround: a slightly different wire format that lets browsers call gRPC services via a proxy.
Two gRPC-Web modes:
- grpc-web (text format) — base64-encoded body, trailers appended to the body with a special framing byte. Works with HTTP/1.1.
- grpc-web+proto (binary) — same protobuf body as native gRPC but with the trailers concatenated to the body. The browser-side library de-frames and surfaces the trailer status.
Both modes are translated to native gRPC by a server-side proxy (Envoy's grpc_web filter is the canonical implementation). gRPC-Web supports unary and server-streaming RPCs. Client streaming and bidirectional streaming are not supported in any browser today — a fundamental limitation of Fetch's request body semantics.
gRPC vs Other RPC Styles
| gRPC | REST + JSON | GraphQL | Apache Thrift | |
|---|---|---|---|---|
| Wire format | Protobuf (binary) | JSON (text) | JSON over HTTP | Binary (Compact / Binary protocol) |
| Transport | HTTP/2 only | HTTP/1.1, /2, /3 | HTTP, WebSocket subscriptions | TCP, HTTP, custom |
| Schema | .proto, code-generated | OpenAPI optional | SDL, code-generated typed clients | .thrift, code-generated |
| Streaming | 4 modes built-in | SSE, long-poll, WebSocket bolt-ons | Subscriptions (WebSocket) | Limited |
| Browser | gRPC-Web with proxy | Native | Native | Limited |
| Best at | Internal microservices, polyglot | Public APIs, simple integrations | Front-end driven aggregation | Facebook-era polyglot (legacy) |
| Schema evolution | Strong; field numbers | Conventional | Strong but explicit | Strong; field IDs |
| Tooling | protoc, buf, grpcurl, ghz, evans | curl, Postman, OpenAPI generators | GraphiQL, Apollo | thrift compiler |
Tradeoffs and Honest Weaknesses
- Browser story is awkward — gRPC-Web requires a translating proxy and doesn't support client/bidi streaming. For consumer-facing APIs, REST+JSON or GraphQL is still the right answer.
- Debugging is harder than HTTP — protobuf isn't readable in tcpdump or Wireshark without a .proto-aware decoder.
grpcurl,ghz, andevansexist precisely becausecurldoesn't speak gRPC. - HTTP/2 dependency — works fine in modern infrastructure but doesn't compose with HTTP/1-only intermediaries (some legacy load balancers, ancient proxies). HTTP/3 support is improving but uneven across language SDKs.
- Proto3 default-value erasure — "field absent" and "field set to default" look the same on the wire. The
optionalkeyword fixes it but requires schema discipline. - Default 4 MiB message limit — silent breakage when payloads grow. Tunable, but the default catches everyone exactly once.
- Status code domain is small — 17 codes for everything. Real applications need richer error info, which means either
grpc-status-details-bin(the structured google.rpc.Status proto) or stuffing details into the message. There's no analog of REST's "embed JSON error in 400 body" pattern that everyone already groks. - Streaming is harder than it looks — backpressure across language boundaries, half-close semantics, mid-stream cancellation, and resource leakage on dropped streams are subtle. Production gRPC streaming requires careful testing.
Frequently Asked Questions
Why does HTTP status always return 200 even for errors?
grpc-status trailer: 0 = OK, 1 = CANCELLED, 2 = UNKNOWN, ..., 14 = UNAVAILABLE, 16 = UNAUTHENTICATED. This separation is necessary because streamed responses produce bytes successfully right up until the moment they fail; you cannot send the HTTP status header twice. Trailers are how gRPC reports outcome after the body.Should I use unary calls or server streaming for fetching a list?
What's the difference between Channel and ClientConn (in Go) and Stub?
orderClient.GetOrder(ctx, req) and serializes them onto the channel. Stubs are cheap; channels are expensive.What does grpc-encoding do, and should I enable gzip?
grpc-encoding in headers selects a per-message compression codec (gzip, deflate, snappy, identity). Each DATA frame's 5-byte prefix has a compression-flag byte that's set when the message is compressed. Default is identity (no compression). Enable gzip for verbose payloads (large lists, long strings) where the CPU cost is paid back by network savings. For small chatty RPCs, gzip is net-negative. Modern setups often skip per-message compression and rely on TLS-layer compression being disabled (CRIME) plus large enough messages to warrant the codec.How do I version a gRPC API without breaking existing clients?
[deprecated = true], add new fields with new numbers. Across major versions: bump the package name (orders.v1 → orders.v2) so old and new live side by side. Servers can implement both, clients pick one. The HTTP/2 path naturally encodes the version: /orders.v1.OrderService/Get vs /orders.v2.OrderService/Get route to different handlers.Why does my gRPC server occasionally return UNAVAILABLE on the first request after idle?
permit_without_stream=true and keepalive_time=30s or so) or rely on retry policy with UNAVAILABLE as a retryable code, which automatically reconnects.