DynamoDB Internals

DynamoDB is AWS's managed key-value and document store, the production-hardened descendant of the 2007 Dynamo paper. Underneath the simple API is a system that partitions data across thousands of storage nodes, replicates synchronously across three availability zones, scales capacity in seconds, and exposes a deceptively narrow interface — get, put, query, scan. The trick to running it well is understanding what those four operations cost in IOPS, partitions, and dollars, and how the partition key model forces you to design data layouts that look nothing like a normalized relational schema.

Reflects DynamoDB behavior as of 2025: on-demand mode, adaptive capacity, transactional writes, DAX caching, and the standard global tables with multi-region replication.

Why DynamoDB Exists

The Gap
Amazon's 2004 holiday season hit a wall: relational databases could not scale writes for shopping carts and session stores without painful sharding, eventual-consistency hacks, and on-call agony. The 2007 Dynamo paper described the answer; DynamoDB (2012) productized it.
The Insight
Partitioning by a hash of a chosen key, replicating synchronously across three zones, and exposing only the operations the partition layout supports cheaply gives you predictable single-digit-millisecond latency at any scale. Trade joins and ad-hoc queries for guaranteed performance.
The Result
A serverless database that handles 89 million requests per second at the 2024 Prime Day peak, with sub-10 ms p99 reads, no DBA, no version upgrades, no shard reconfiguration. You provision a table, design a key, and ship. The cost is a permanently constrained query model.

Partition Architecture

Request Router — hash(partition_key) → partition Partition A (0x0000-0x5555) Storage Node AZ-A (Leader) B+ tree on disk Replica AZ-B Replica AZ-C Paxos quorum write Partition B (0x5556-0xAAAA) Storage Node AZ-B (Leader) B+ tree on disk Replica AZ-A Replica AZ-C Paxos quorum write Partition C (0xAAAB-0xFFFF) Storage Node AZ-C (Leader) B+ tree on disk Replica AZ-A Replica AZ-B Paxos quorum write Each partition: leader + 2 replicas across AZs • 1000 WCU / 3000 RCU per partition cap

Key Numbers

Max item size
400 KB
Per-partition write
1000 WCU/s
Per-partition read
3000 RCU/s
Txn write items
100 / 4 MB
Replication AZs
3
p99 read
~10 ms
DAX cache hit
~1 ms

The Partition Key + Sort Key Model

Every DynamoDB item is identified by a partition key (PK), with an optional sort key (SK). Together they form the primary key. The PK is hashed to determine which partition stores the item; the SK is stored sorted within that partition. This is the entire data model. There are no joins, no foreign keys, no secondary indexes that span partitions transparently.

Table: orders
PK = customer_id (string)
SK = order_id    (string)

customer_id     | order_id           | total | status     | items
----------------+--------------------+-------+------------+-------
"cust-42"       | "2025-05-01#001"   | 99.50 | "shipped"  | [...]
"cust-42"       | "2025-05-01#002"   | 12.00 | "pending"  | [...]
"cust-42"       | "2025-05-02#001"   | 45.00 | "shipped"  | [...]
"cust-9"        | "2025-04-30#001"   | 220.0 | "shipped"  | [...]

The supported access patterns drop directly out of this layout:

What you cannot do natively: "find all orders with total > 100" (needs a scan), "join orders to customers" (no joins), "search orders by status" (needs a GSI). Every such pattern requires either an index or a different table layout. Choose the partition key by predicting the single most important access pattern, because that is the one that will be free.

The Hot Partition Problem and Adaptive Capacity

DynamoDB partitions cap at 1000 WCU and 3000 RCU each. If your traffic skews to a single partition key, you hit those limits regardless of your table's provisioned throughput. The classic anti-pattern: PK = "current_day", every write goes to one partition, the table sits at 1% utilization while throttling 99% of writes.

Adaptive capacity (introduced 2017, made instantaneous in 2019) mitigates this. The router monitors per-partition heat and reallocates throughput from cold partitions to hot ones in real time. There is also partition splitting on heat: a partition consistently over its limit will be split, redistributing items by sort key range across two new partitions. Both mechanisms are transparent — no API call, no downtime — but they have limits. A single partition key cannot exceed 1000 WCU/3000 RCU even with adaptive capacity, because all items with the same PK must live on the same partition.

// Anti-pattern: every write hits one partition
PK = "2025-05-03"   // today
SK = randomUUID

// Fix: write sharding
const SHARDS = 16;
PK = "2025-05-03#" + (hash(item_id) % SHARDS);   // 16 partitions instead of 1
SK = randomUUID

// Reads now require a fan-out across N shards, but writes scale linearly

Write sharding is the standard remedy for known-hot keys (e.g. timestamp-based event streams, a single popular product's inventory). The number of shards is a tradeoff: more shards means more parallelism on writes but more parallel queries on reads.

On-Demand vs Provisioned Capacity

DynamoDB exposes two billing models. Provisioned mode requires you to declare WCUs and RCUs in advance; throttling kicks in past your provisioned limit. On-demand mode auto-scales and bills per request. The mental model:

A common gotcha: on-demand has a "previous peak doubled" warm capacity. A new on-demand table can immediately serve 4000 WCU and 12,000 RCU. To safely exceed those, traffic must ramp; otherwise you'll throttle until the table warms.

Global and Local Secondary Indexes

Indexes give you alternative access patterns at the cost of extra storage and write amplification.

// Base table: orders
PK = customer_id, SK = order_id

// GSI on status: "all PENDING orders across all customers"
PK = status,      SK = order_date

// GSI for inverted relationship: "find customer of an order_id"
PK = order_id,    SK = customer_id

GSIs are the real workhorse and the source of most surprise costs. Every write to the base table is double-billed when there is a GSI: one WCU for the base, one WCU for each affected GSI. Sparse indexes (where most items lack the indexed attribute) are a useful trick — only items with a non-null indexed attribute are written to the GSI, dramatically reducing its size and cost.

Single-Table Design

The Rick Houlihan philosophy: pack heterogeneous entity types into one table, using carefully chosen PK/SK conventions so you can read related data with a single Query. This minimizes round-trips, leverages DynamoDB's strengths (single-partition queries are cheap), and avoids the cross-table consistency problems of multi-table designs.

Table: app_data

PK              | SK                       | Type     | Attributes...
----------------+--------------------------+----------+-------------------
"USER#42"       | "PROFILE"                | profile  | email, name, plan
"USER#42"       | "ORDER#2025-05-01#001"   | order    | total, status
"USER#42"       | "ORDER#2025-05-01#002"   | order    | total, status
"USER#42"       | "ADDR#shipping"          | address  | street, city
"PROD#sku-7"    | "DETAIL"                 | product  | name, price
"PROD#sku-7"    | "REVIEW#2025-04-19#u9"   | review   | stars, body

// Query: all data for user 42
Query  PK = "USER#42"

// Query: just orders for user 42 in May
Query  PK = "USER#42" AND begins_with(SK, "ORDER#2025-05-")

Using a generic PK/SK name (rather than entity-specific) lets you define GSIs that span entity types. A GSI on (GSI1PK, GSI1SK) can index orders by status while also indexing reviews by date. The cost of single-table design is steep cognitive overhead — readers must decode the prefix-pattern to understand the schema, and refactoring access patterns later is painful. Use it when access patterns are well-understood; reach for multiple tables when they aren't.

Conditional Writes and Transactions

DynamoDB supports optimistic concurrency through ConditionExpression:

// Increment a counter only if its current value matches expected
UpdateItem  Key = { PK: "USER#42", SK: "COUNTER" }
            UpdateExpression = "SET v = v + :inc"
            ConditionExpression = "v = :expected"
            ExpressionAttributeValues = { :inc: 1, :expected: 41 }

// Returns ConditionalCheckFailedException if v != 41

For multi-item atomicity, TransactWriteItems wraps up to 100 writes (subject to a 4 MB total size) into a serializable transaction across multiple items, multiple tables, even multiple partitions. TransactGetItems does the same for reads. Transactions cost double the WCU/RCU of equivalent non-transactional operations because of the two-phase commit protocol underneath.

TransactWriteItems
  Put     orders     PK="USER#42" SK="ORDER#1"   Condition: "attribute_not_exists(PK)"
  Update  inventory  PK="PROD#7"  SK="STOCK"     SET qty = qty - :n  Condition: "qty >= :n"
  Update  ledger     PK="USER#42" SK="BAL"       SET bal = bal - :total

Transactions are how you build a real e-commerce checkout in DynamoDB. The 100-item cap is the practical limit of how much you can stitch together; beyond it, you fall back to event-driven reconciliation through DynamoDB Streams.

DynamoDB Streams

Streams are an ordered, change-data-capture log of every modification to a table item, retained for 24 hours. Each stream record contains the keys, the previous image, the new image, or both (configurable). They are partition-ordered: changes to the same item arrive in order, but changes to different items can interleave.

// Stream record example (NEW_AND_OLD_IMAGES)
{
  "eventID":   "abc123",
  "eventName": "MODIFY",
  "dynamodb": {
    "Keys":         { "PK": "USER#42", "SK": "ORDER#001" },
    "OldImage":     { "PK": "USER#42", "SK": "ORDER#001", "status": "pending"  },
    "NewImage":     { "PK": "USER#42", "SK": "ORDER#001", "status": "shipped"  },
    "SequenceNumber": "1234567890",
    "ApproximateCreationDateTime": 1714723200
  }
}

The standard pattern: a Lambda function subscribes to the stream and reacts to changes — writing to a search index, denormalizing into another DynamoDB table, publishing to SNS, maintaining materialized views. Streams turn DynamoDB into the event source of an event-driven architecture without bolting on Kafka. Failure handling matters: Lambda retries on failure with exponential backoff, but partition workers will block on a poison record until you wire up a dead-letter queue.

DAX — DynamoDB Accelerator

DAX is an in-memory, write-through caching layer for DynamoDB, API-compatible with the standard SDK. It runs as a multi-node cluster in your VPC and intercepts reads. Cache hits return in under a millisecond. Two caches inside each node:

Both default to a 5-minute TTL, configurable. Writes go through DAX to DynamoDB synchronously, and on success the item is invalidated from the cache. The right use case is read-heavy workloads with cacheable hot keys; the wrong use case is anything that depends on strongly-consistent reads (DAX is eventually consistent against the underlying table).

The 400 KB Limit and What It Forces

No DynamoDB item may exceed 400 KB total — keys, attributes, and overhead. This is not a default that can be raised. Three patterns cope:

Combined with the 1 MB Query response cap, the 400 KB limit pushes designs toward many small items rather than few large ones. The pagination flow is built into the API — every Query and Scan response includes a LastEvaluatedKey the client passes back to continue.

Backups, PITR, and Global Tables

Three durability features layer on top of the core engine:

Global tables are how you build a globally low-latency app on DynamoDB. The cost is real (each region's writes count separately) and the consistency model is eventual across regions — conflicts on the same key need application-level reconciliation if last-writer-wins is wrong.

Tradeoffs and When Not to Use It

DynamoDB vs Cassandra vs Spanner

DynamoDBCassandraSpanner
OperatorManaged (AWS only)Self-hosted or DataStaxManaged (Google Cloud only)
Data modelKey-value + sort keyWide-column (CQL)Relational (SQL)
ConsistencyEventual default; strong opt-inTunable per queryExternal / serializable always
TransactionsMulti-item up to 100 / 4 MBLWT (single partition)Cross-partition serializable
Replication3 AZs sync, regions asyncSync within DC, async cross-DCPaxos sync across regions
Schema changesNone (schemaless)Online ALTEROnline ALTER, no locks
Best forPredictable KV at any scaleTime-series, OSS multi-cloudStrong consistency at scale

Frequently Asked Questions

How do I do a count(*) of items?

You can't, cheaply. Scan with Select=COUNT reads every item to compute it; on a large table this is expensive and slow. The right pattern is to maintain a counter item with atomic UpdateItem ... ADD count :inc on every insert/delete, or to build the counter from DynamoDB Streams.

What's the difference between Query and Scan?

Query operates on a single partition key (and an optional sort key range). It costs RCUs proportional to the data returned and is fast — single-digit milliseconds. Scan reads every partition of the table, paginated by 1 MB chunks. It costs RCUs proportional to the entire table, even if a filter discards most rows. Use Query for everything in production. Scan is for batch jobs, exports, and tables with < 100K items.

Why am I getting throttled at 100 RCU when my table is provisioned for 10,000?

Almost certainly a hot partition. Your traffic is concentrating on a single partition key (or a small subset) and exceeding the 3000 RCU per-partition cap, even though the table-level provisioned capacity is much higher. CloudWatch's "Hot Partition" metric and the contributor insights feature surface the culprit keys. Fix: better key design or write sharding.

How is DynamoDB billed?

Three lines: storage (~$0.25/GB-month), read/write capacity (per RCU/WCU-hour in provisioned, per million requests in on-demand), and data transfer out. Add DynamoDB Streams ($0.02/100K reads), GSI capacity (separate from base), DAX (per-node-hour), backups ($0.10/GB-month), and cross-region replication for global tables.

Can I use SQL with DynamoDB?

PartiQL gives you a SQL-like syntax for SELECT/INSERT/UPDATE/DELETE, but it's a thin facade over the same API — every PartiQL query maps to a Query or Scan, with the same partition-key access requirements. There are no joins, no aggregates, no subqueries. PartiQL is convenient, not a query engine.

How do I migrate from DynamoDB to a relational database?

The right escape hatch is DynamoDB Streams + Lambda + an external sink. Live-tail every change to the new database while a one-time backfill copies the existing table (parallel scan or S3 export). Once caught up, cut writes over to the new system. Reverse migrations (RDBMS → DynamoDB) are harder because relational schemas typically support access patterns that DynamoDB models can't replicate without a redesign.

What's a sparse index and when should I use one?

A sparse GSI is an index where most table items don't have the indexed attribute, so they don't appear in the index. Useful for "find items in state X" — set a status attribute only when an item is in that state, GSI-index that attribute. The index is small and cheap to query. Pattern example: SK = "ORDER#..." and a GSI on "unshipped_marker" that's only present while the order is unshipped.