Design an E-commerce Platform (Amazon-style)
An e-commerce platform is not one system but a confederation: catalog, search, recommendations, cart, checkout, inventory, order management, payment, shipping. Each is a microservice with its own scale profile, failure model, and consistency guarantee. The interesting interview discussions are which boundaries to draw between them, how inventory holds protect against overselling, and how the order saga survives partial failure across payment + warehouse + shipping.
Architecture
Capacity Estimation
| Metric | Value | Notes |
|---|---|---|
| Catalog products | ~500 M | Amazon-scale |
| Searches/s peak | ~500 K | Cyber Monday |
| Orders/s peak | ~5 K | 10× daily peak |
| Cart updates/s | ~50 K | read-heavy |
| Inventory checks/s | ~100 K | cart + checkout |
| Catalog change rate | ~10 K/s | price + stock updates |
| Order DB size / yr | ~10 TB | 1 KB/order × 100 M/yr |
Catalog
Products live in the catalog: SKU, title, description, images, price, attributes (size, color), category. Storage:
- Source of truth: relational DB (Postgres / MySQL) per merchant or per product family. Strong consistency for price and inventory lookups.
- Read-heavy projection: denormalized JSON in DynamoDB / Redis cache, keyed by SKU. Sub-millisecond reads on the product page.
- Search index: Elasticsearch rebuilt from the source via change-data-capture; eventually consistent, fine for search.
Price changes are fast-path: prices in the cart vs in the order vs in the invoice can disagree if you display catalog price after the user added to cart. Best practice: snapshot the price into the cart at add-time; show "price changed" alert if it shifts before checkout.
Search
Full-text search on titles + descriptions, faceted filters (brand, price range, category), sorted by relevance + business signal (popularity, sponsored). Elasticsearch is the canonical implementation; modern shops are exploring vector search for "semantically similar products."
- Index pipeline: catalog DB → CDC → Kafka → ES indexer. Latency target: < 30 s from price change to search-visible.
- Ranking: BM25 base + learned-to-rank model on top using user signal (click, add-to-cart, purchase).
- Multi-tenant: separate ES indices per merchant or per locale; reduces cross-tenant noise.
Recommendations
Three layers:
- Co-purchase — "users who bought X also bought Y". Computed offline via Spark; served from a key-value cache.
- Personalized — collaborative filtering on the user's history; matrix factorization or two-tower neural net.
- Real-time — "recently viewed" / "back in stock"; session-scoped, pulled from a Redis stream.
The product page composes them: 70% relevance score, 30% business signal (margin, inventory pressure). Test relentlessly; the rec engine drives ~30% of revenue.
Cart
Cart is per-user state: (user_id, sku, qty, added_at, snapshot_price). Storage:
- Authenticated user: persistent in DynamoDB, keyed by user_id. Survives device switch.
- Anonymous user: cookie-bound cart_id; merge into the user's cart on login.
- Long-tail abandoned cart: TTL out at 30 days; trigger reminder email (notification system).
The cart is also where business rules apply: minimum order, gift wrap fees, promo codes. Prefer a separate "cart engine" service over scattering rules across UI; testing is harder when business logic is in the React app.
Inventory and Reservations
Overselling is the cardinal sin. Two strategies:
- Optimistic: read available quantity at checkout; transactionally decrement on order placement. Cheap, but allows brief over-promise during a flash sale (multiple checkout flows succeed before one decrements).
- Reservation/hold: at add-to-cart or at checkout-start, decrement available and create a hold with TTL (10–30 min). On order success, convert hold to commitment. On TTL expiry, release the hold back to available.
Reservation prevents overselling but requires reliable TTL handling (a lost release ties up inventory forever). Implement TTL via a database expiration column + sweeper, not a Redis TTL alone — sweeper is durable, Redis is best-effort.
Checkout and Payment
The checkout endpoint is a saga:
- Verify cart, prices, addresses.
- Reserve inventory (if not already).
- Create draft order in DB with status
pending_payment. - Charge payment via the payment system with an idempotency key derived from order_id.
- On payment success, transition order to
confirmed; release inventory holds and convert to commitments. - Emit
OrderConfirmedevent to fulfillment. - On any step failure: compensate (release inventory, refund payment, notify customer).
Run via Temporal / Step Functions for durability across crashes and retries. Idempotency keys at every external boundary are non-negotiable.
Order Management
The order is the source-of-truth ledger of a transaction. Schema: (order_id, user_id, status, line_items[], shipping_address, billing_address, payment_ref, created_at). Status state machine: pending_payment → confirmed → fulfilling → shipped → delivered; alternatives cancelled, refunded, returned.
Storage: per-shard Postgres (sharded by user_id) for OLTP; CDC to a data warehouse (BigQuery / Snowflake) for analytics. Order history view is the user-facing read.
Shipping Integration
Carriers (FedEx, UPS, DHL) expose APIs for label generation, rate quotes, tracking. Patterns:
- Multi-carrier abstraction — ShipEngine, Shippo, EasyPost, or in-house. Shop-around price comparison.
- Async tracking — webhook from carrier on status change; project into the order's status.
- Failure handling — lost packages, returns, address corrections. Each is an order state transition with its own ledger entries.
Failure Modes
- Inventory drift — physical warehouse stock diverges from the system. Daily reconciliation job; alert on > 1% delta.
- Payment-charged-no-order — payment succeeds but order DB write fails. The saga's compensation must refund. Without this, customer pays for nothing.
- Hot SKU — flash sale on one product; one DB row gets all the inventory writes. Pre-shard inventory across N counters; aggregate at query time.
- Catalog-search divergence — search shows a product that the catalog has retired. Mark search results that 404 in catalog as stale; periodically reconcile.
FAQ
Monolith or microservices?
Most successful e-commerce platforms started monolithic and split as scale demanded. The natural service boundaries are catalog/search/cart/order/payment; do not split further than that without a reason.
How do you handle multi-currency?
Per-locale price tables; FX rates updated daily. Show the user's local currency consistently from catalog through invoice. Settle in the merchant's currency at checkout.
What about flash sales / drops?
Pre-warm cache, queue requests, randomly distribute spots to prevent dogpile. Use a virtual-queue pattern (Shopify's "checkout throttle") to admit users to the cart in batches.
How do reviews + Q&A fit?
Separate microservice with its own DB. Tightly coupled to the product page (for display) and search (for facet filters: "4+ stars only").