Architecture

Web / mobile API gateway / BFF auth, A/B, throttle Catalog Search Cart Inventory Recommend Order Payment Fulfillment Shipping Notification Event bus (Kafka): order, inventory, shipment

Capacity Estimation

MetricValueNotes
Catalog products~500 MAmazon-scale
Searches/s peak~500 KCyber Monday
Orders/s peak~5 K10× daily peak
Cart updates/s~50 Kread-heavy
Inventory checks/s~100 Kcart + checkout
Catalog change rate~10 K/sprice + stock updates
Order DB size / yr~10 TB1 KB/order × 100 M/yr

Catalog

Products live in the catalog: SKU, title, description, images, price, attributes (size, color), category. Storage:

  • Source of truth: relational DB (Postgres / MySQL) per merchant or per product family. Strong consistency for price and inventory lookups.
  • Read-heavy projection: denormalized JSON in DynamoDB / Redis cache, keyed by SKU. Sub-millisecond reads on the product page.
  • Search index: Elasticsearch rebuilt from the source via change-data-capture; eventually consistent, fine for search.

Price changes are fast-path: prices in the cart vs in the order vs in the invoice can disagree if you display catalog price after the user added to cart. Best practice: snapshot the price into the cart at add-time; show "price changed" alert if it shifts before checkout.

Search

Full-text search on titles + descriptions, faceted filters (brand, price range, category), sorted by relevance + business signal (popularity, sponsored). Elasticsearch is the canonical implementation; modern shops are exploring vector search for "semantically similar products."

  • Index pipeline: catalog DB → CDC → Kafka → ES indexer. Latency target: < 30 s from price change to search-visible.
  • Ranking: BM25 base + learned-to-rank model on top using user signal (click, add-to-cart, purchase).
  • Multi-tenant: separate ES indices per merchant or per locale; reduces cross-tenant noise.

Recommendations

Three layers:

  • Co-purchase — "users who bought X also bought Y". Computed offline via Spark; served from a key-value cache.
  • Personalized — collaborative filtering on the user's history; matrix factorization or two-tower neural net.
  • Real-time — "recently viewed" / "back in stock"; session-scoped, pulled from a Redis stream.

The product page composes them: 70% relevance score, 30% business signal (margin, inventory pressure). Test relentlessly; the rec engine drives ~30% of revenue.

Cart

Cart is per-user state: (user_id, sku, qty, added_at, snapshot_price). Storage:

  • Authenticated user: persistent in DynamoDB, keyed by user_id. Survives device switch.
  • Anonymous user: cookie-bound cart_id; merge into the user's cart on login.
  • Long-tail abandoned cart: TTL out at 30 days; trigger reminder email (notification system).

The cart is also where business rules apply: minimum order, gift wrap fees, promo codes. Prefer a separate "cart engine" service over scattering rules across UI; testing is harder when business logic is in the React app.

Inventory and Reservations

Overselling is the cardinal sin. Two strategies:

  • Optimistic: read available quantity at checkout; transactionally decrement on order placement. Cheap, but allows brief over-promise during a flash sale (multiple checkout flows succeed before one decrements).
  • Reservation/hold: at add-to-cart or at checkout-start, decrement available and create a hold with TTL (10–30 min). On order success, convert hold to commitment. On TTL expiry, release the hold back to available.

Reservation prevents overselling but requires reliable TTL handling (a lost release ties up inventory forever). Implement TTL via a database expiration column + sweeper, not a Redis TTL alone — sweeper is durable, Redis is best-effort.

Checkout and Payment

The checkout endpoint is a saga:

  1. Verify cart, prices, addresses.
  2. Reserve inventory (if not already).
  3. Create draft order in DB with status pending_payment.
  4. Charge payment via the payment system with an idempotency key derived from order_id.
  5. On payment success, transition order to confirmed; release inventory holds and convert to commitments.
  6. Emit OrderConfirmed event to fulfillment.
  7. On any step failure: compensate (release inventory, refund payment, notify customer).

Run via Temporal / Step Functions for durability across crashes and retries. Idempotency keys at every external boundary are non-negotiable.

Order Management

The order is the source-of-truth ledger of a transaction. Schema: (order_id, user_id, status, line_items[], shipping_address, billing_address, payment_ref, created_at). Status state machine: pending_payment → confirmed → fulfilling → shipped → delivered; alternatives cancelled, refunded, returned.

Storage: per-shard Postgres (sharded by user_id) for OLTP; CDC to a data warehouse (BigQuery / Snowflake) for analytics. Order history view is the user-facing read.

Shipping Integration

Carriers (FedEx, UPS, DHL) expose APIs for label generation, rate quotes, tracking. Patterns:

  • Multi-carrier abstraction — ShipEngine, Shippo, EasyPost, or in-house. Shop-around price comparison.
  • Async tracking — webhook from carrier on status change; project into the order's status.
  • Failure handling — lost packages, returns, address corrections. Each is an order state transition with its own ledger entries.

Failure Modes

  • Inventory drift — physical warehouse stock diverges from the system. Daily reconciliation job; alert on > 1% delta.
  • Payment-charged-no-order — payment succeeds but order DB write fails. The saga's compensation must refund. Without this, customer pays for nothing.
  • Hot SKU — flash sale on one product; one DB row gets all the inventory writes. Pre-shard inventory across N counters; aggregate at query time.
  • Catalog-search divergence — search shows a product that the catalog has retired. Mark search results that 404 in catalog as stale; periodically reconcile.

FAQ

Monolith or microservices?

Most successful e-commerce platforms started monolithic and split as scale demanded. The natural service boundaries are catalog/search/cart/order/payment; do not split further than that without a reason.

How do you handle multi-currency?

Per-locale price tables; FX rates updated daily. Show the user's local currency consistently from catalog through invoice. Settle in the merchant's currency at checkout.

What about flash sales / drops?

Pre-warm cache, queue requests, randomly distribute spots to prevent dogpile. Use a virtual-queue pattern (Shopify's "checkout throttle") to admit users to the cart in batches.

How do reviews + Q&A fit?

Separate microservice with its own DB. Tightly coupled to the product page (for display) and search (for facet filters: "4+ stars only").