API Gateway — Routing, Auth, Rate Limiting at the Edge

Architecture

Capacity Estimation

For a SaaS at 500 K RPS peak across 120 microservices:

Metric	Value	Notes
Peak ingress	500 K req/s	3× typical day
p50 latency overhead	2–5 ms	auth + rate limit + route
p99 latency overhead	15–30 ms	cold token cache, JWKS fetch
Connections inbound	~5 M concurrent	HTTP/1.1 + HTTP/2
Memory per gateway pod	2–8 GB	route table + token cache
Routes managed	~10 K	per service per version
JWT verifications/s	500 K	cached JWKS keys

Request Routing

Routing is the gateway's defining job: take an incoming HTTP request and pick the upstream cluster. Three styles:

Path-prefix — /api/orders/* → orders service. Simplest, default in Kong/Tyk. Requires service teams to coordinate path namespaces.
Hostname — orders.api.example.com. Cleaner separation, but DNS+cert sprawl.
Header / weighted — route on X-Tenant-Id or canary weight. Powers blue-green and dark-launch deploys.

Internally the gateway compiles routes into a radix tree (Envoy, Caddy, Traefik all use this) for O(log n) match. Order matters when patterns overlap: more-specific wins; ties broken by definition order. Rebuild the route table atomically on config change — never edit in place under concurrent reads.

Authentication and Authorization at the Edge

Two patterns dominate. Token introspection — the gateway calls the auth service for every request. Strong consistency on revocation, but adds a hop and a hard dependency. Self-contained tokens (JWT) — gateway verifies signature locally against the IdP's JWKS. Fast, cacheable, but revocation is delayed until token TTL expires (mitigate with a short TTL, 5–15 min, plus a Redis revocation list checked on each request).

For partner / B2B APIs, prefer mTLS + signed request body (AWS SigV4 style). For browser clients, OAuth2 + PKCE with the gateway as the resource server. See JWT and OAuth 2.0 for the full token lifecycle.

Authorization at the gateway is coarse-grained: can this consumer reach this route at all? Fine-grained ("can this user delete this order?") belongs in the service. Crossing that line turns the gateway into a policy engine that nobody can change without a release.

Rate Limiting per Consumer

Three levels typically stack: global (5 M req/s ceiling protects the cluster), per-route (search is expensive, cap at 50 K/s), per-consumer (free tier 100 req/min, paid 10 K/min). Implement with token buckets in Redis using atomic Lua scripts; key = (consumer_id, route, window). The decision must be sub-millisecond — do not call out to a remote service for the limit check on each request. See rate limiter for algorithm choice.

Return 429 Too Many Requests with Retry-After and X-RateLimit-Remaining headers. Without those headers, clients implement their own retry storms that make the problem worse.

Response Caching

The gateway is a natural cache layer for idempotent GETs: /products/42, /users/me/profile. Honor Cache-Control from the upstream; expose stale-while-revalidate for read-heavy hot keys. Vary on auth so user A does not get user B's personalized response. For unauthenticated catalog data, a 60-s TTL can shed 90% of upstream load.

Caching at the gateway is cheap to add and expensive to remove — once teams rely on it, removing it falls over the upstream. Make TTLs explicit per route, not implicit.

Request and Response Transformation

The gateway can rewrite headers, strip internal-only fields, version-shim old clients, and translate protocols (REST ↔ gRPC). This is the most addictive feature: every business-logic gap "can be fixed quickly at the gateway." Resist it. Transformations should be:

Stateless — no DB lookup mid-flight.
Declarative — YAML/JSON config, not Lua/JavaScript code, for the 80% case.
Limited — header rewrites, field-strip on response, compression. Anything more belongs in a service.

The BFF Pattern (Backend-for-Frontend)

A single API gateway optimized for one consumer (web, iOS, Android) is a BFF. Each frontend gets a tailored API that aggregates 5 microservices into the one screen the user sees, returning exactly the fields and shapes that frontend needs. The BFF is owned by the frontend team, deploys with the frontend release cadence, and has a tighter feedback loop than a generic gateway.

BFF vs generic gateway: BFFs do composition (fan-out + assemble); generic gateways do routing (one-in, one-out). Most large systems run both: a generic gateway at the edge and per-channel BFFs behind it. GraphQL is one popular implementation of the BFF idea.

Kong, Tyk, AWS API Gateway, Cloudflare Workers

Kong — OpenResty/NGINX-based, Lua plugins, hybrid mode (control plane + data plane). Strong plugin ecosystem; the open-source version covers 80% of needs.
Tyk — Go-based, lighter, native multi-tenant. Good when you need analytics out of the box.
AWS API Gateway — managed REST/HTTP/WebSocket. REST tier has rich features (request validation, AWS_IAM auth, caching); HTTP tier is cheaper and lower-latency. Tight Lambda integration is the killer feature; the lock-in is the cost.
Cloudflare Workers — not a gateway in the traditional sense, but the V8-isolate model gives you a programmable edge layer with sub-millisecond cold start. Best for global anycast and edge auth, weaker for traditional service mesh.
Envoy — the building block under everything else (Istio, AWS App Mesh, Solo Gloo). Choose Envoy directly if you want the substrate without the opinionated wrapping.

Failure Modes

Gateway becomes the SPOF — deploy at least 3 instances behind a TCP load balancer; health-check to drain unhealthy nodes within 5 s.
JWKS fetch storm on key rotation — cache JWKS aggressively (24 h) with negative-cache on rotation; refresh asynchronously, never blocking a request.
Slow upstream blocks all consumers — per-upstream connection pools and circuit breakers; fail-fast on the slow one rather than queue all consumers behind it.
Logs as the bottleneck — gateway access logs at 500 K RPS produce TB/day. Sample, structured, async-pipe to Elasticsearch or S3 + Athena.

FAQ

Gateway vs service mesh?

Gateway = north-south (clients to services). Service mesh = east-west (service to service). They overlap on auth and rate limiting; you usually want both, integrated through a shared identity (SPIFFE) and observability stack.

Should auth happen at the gateway or in the service?

Authentication (who) at the gateway. Authorization (what they can do) in the service, with the user identity passed through as a verified header.

Single gateway or one-per-team?

Single tier of generic gateways at the edge for shared concerns (TLS, DDoS, identity). BFFs per channel/team for composition. Avoid one gateway per microservice — you reinvent the network.

How do you do canary on an API gateway?

Header-based routing with weighted traffic split: 1% → new version, 99% → stable. Promote on success metric (error rate, p99). Roll back on threshold breach.