Ethereum Internals
Ethereum is the world computer that almost works. It is a replicated state machine with ~1 million validators executing the same transactions and reaching consensus on the result every 12 seconds. Underneath the user-visible "send some ETH" or "swap on Uniswap" sits a stack of carefully chosen primitives: an account model that's neither pure key-value nor pure UTXO, a Merkle Patricia Trie that turns global state into a cryptographic commitment, the EVM (a 256-bit stack machine with metered execution), an EIP-1559 fee market that burns base fees instead of paying them to miners, and — since The Merge in 2022 — a clean split between the execution layer (what the EVM runs) and the consensus layer (Casper FFG plus LMD GHOST proof-of-stake).
The protocol is governed by Ethereum Improvement Proposals (EIPs). Hard forks ship roughly every 6-12 months: London (1559), Paris (Merge), Shanghai (withdrawals), Dencun (4844 blobs), Pectra (account abstraction primitives).
Ethereum Architecture Overview
Key Numbers
Why Ethereum Exists
The Account Model: EOA vs Contract
Ethereum has two account types, both addressed by 20-byte hash:
- Externally Owned Account (EOA) — controlled by a private key (secp256k1). Has a balance and a nonce. Cannot run code; can only sign and send transactions. The address is the last 20 bytes of
keccak256(pubkey). - Contract account — controlled by code. Has a balance, a nonce, deployed bytecode, and a 2^256-slot storage trie. Cannot initiate transactions; can only respond to calls. The address at deploy time is
keccak256(rlp([sender, nonce]))[12:], or with CREATE2, deterministic from the deployer + salt + bytecode hash.
An account state is a 4-tuple: (nonce, balance, storageRoot, codeHash).
For an EOA, storageRoot is the empty-trie root and codeHash is
the keccak256(""). For a contract, both are populated. The collection of
all accounts forms the state trie, and its root hash is committed in
every block header.
A transaction from an EOA carries: nonce (must equal the
sender's current nonce, providing replay protection and ordering),
(maxFeePerGas, maxPriorityFeePerGas) (post-EIP-1559),
gasLimit, to, value, data, plus an
ECDSA signature (v, r, s). Setting to to 0x0
makes it a contract creation; the data field is then the constructor
bytecode.
The Merkle Patricia Trie
Ethereum's state isn't stored as a flat key-value table; it's a Merkle Patricia Trie (MPT) — a radix trie where every node is hashed and every parent commits to its children's hashes. The root hash uniquely identifies the entire global state. A single contract storage slot's value can be verified against that root with a logarithmic-size proof.
An MPT mixes three node types to balance depth and branching:
- Branch node — 17 entries. Indexed 0-15 (one per nibble of the path) plus an optional value at index 16 for a key that terminates exactly here.
- Extension node — a (shared-path-prefix, child-hash) pair. Compresses a long single-child chain into one node.
- Leaf node — a (remaining-path, value) pair at the end of a path.
Keys are nibbleized (split into 4-bit chunks) before insertion, giving 16-way branching. The MPT has 4 logical tries per block:
- State trie — keyed by
keccak256(address), value is the RLP-encoded account state. - Storage trie — one per contract, keyed by
keccak256(slot), value is the RLP-encoded slot value. - Transaction trie — keyed by transaction index in the block.
- Receipt trie — keyed by transaction index, value is the receipt (status, logs, gas used).
The MPT is widely acknowledged as one of Ethereum's worst design choices. Updates are I/O-amplified: changing one storage slot rewrites the leaf, every ancestor up to the storage root, the contract's account state in the state trie, and every state-trie ancestor up to the global root. A single tx that touches 10 contracts triggers ~50 MPT writes. The replacement, Verkle Tries (using vector commitments with much wider branching), is on the medium-term roadmap.
The EVM: Stack Machine, Gas, Opcodes
The Ethereum Virtual Machine is a 256-bit stack machine with three memory regions:
- Stack — at most 1024 256-bit words. Almost all opcodes consume and produce stack values. The stack is per-call-frame; each
CALLcreates a fresh stack. - Memory — a byte-addressable, zero-initialized array, also per-call-frame. Grows on demand; gas cost is quadratic past 32 KiB to discourage gigantic allocations.
- Storage — persistent 256-bit-key, 256-bit-value mapping per contract. Survives across calls and transactions. Writes are by far the most expensive operation in the EVM.
Every opcode has a gas cost. Cheap ones (ADD, PUSH)
cost 3 gas. Memory expansion costs roughly 3 + n²/512 gas. SLOAD from cold
storage is 2100 gas; warm SLOAD is 100 gas (EIP-2929 introduced the
cold/warm distinction). SSTORE ranges from 100 (rewriting the same value)
to 22100 (zero → non-zero), with up to 19900 gas refunded for clearing storage
(capped at 20% of gas used). CALL is 700 base plus the cost of any value
transfer plus the gas forwarded to the callee. CREATE2 is 32000 plus the
cost of executing the constructor.
Gas serves three purposes simultaneously: it bounds execution time (no infinite loops),
it prices network resources (storage is far more expensive than CPU), and it creates
the fee market. A transaction provides a gasLimit; if execution exceeds it,
everything reverts but the gas is still charged. If it succeeds with leftover gas, the
remainder is refunded.
A precompile is a contract at a fixed low address (0x01-0x0a) implemented
natively in the client for cryptography that would be prohibitively expensive in EVM
bytecode: ECRECOVER, SHA256, RIPEMD-160,
identity, modexp, BN254 curve ops, BLS12-381, KZG point evaluation. Each has a hand-tuned
gas formula; getting it wrong is consensus-critical.
EIP-1559: Base Fee, Priority Fee, Burn
Pre-1559, transactions had a single gasPrice field, and miners chose
transactions purely by descending price. Block space was a first-price auction, which
led to overbidding, gas wars, and unpredictable fees.
EIP-1559 (London, 2021) replaced that with a two-component fee:
- Base fee — set by the protocol per block, adjusts ±12.5% per block based on whether the previous block was above or below the 15M-gas target. Burned (sent to
0x0), not paid to the proposer. - Priority fee (tip) — extra ETH per gas the sender offers above the base fee, paid to the block proposer as a tip.
Senders set maxFeePerGas (an upper bound, not a real bid) and
maxPriorityFeePerGas (the tip). The actual fee is
min(maxFeePerGas, baseFee + maxPriorityFeePerGas) per gas. If the base fee
ever exceeds maxFeePerGas, the transaction sits in the mempool until the
base fee drops or the tx is replaced.
Three properties matter:
- Predictability — base fee is observable and deterministic from the previous block. Wallets can quote a fee that's almost certain to land.
- Burn — base fees are removed from circulation. At sustained high activity, ETH can become net deflationary. Roughly 4M ETH have been burned since 1559 launched.
- No more gas wars — competitive bidding is now bounded; you outbid by tip, which is much smaller than the total fee.
Consensus: Casper FFG and LMD GHOST
Ethereum's proof-of-stake consensus is a hybrid of two algorithms operating on different timescales:
LMD GHOST (Latest Message Driven Greedy Heaviest Observed Subtree) is the fork-choice rule. Every slot, a randomly-selected validator (the proposer) builds and broadcasts a block. Every other validator in the slot's committee casts an attestation — a signed vote for the head of the chain they consider canonical. LMD GHOST picks the chain whose subtree has the most attestations, weighted by validator effective balance. Each validator's most recent attestation overwrites their previous one (the "latest message" part).
Casper FFG (Friendly Finality Gadget) sits on top and provides finality. Every epoch (32 slots = 6.4 min), validators cast a separate "checkpoint vote" identifying which epoch boundary they consider justified. When 2/3 of stake votes for the same checkpoint twice in a row, that checkpoint becomes finalized — economically irreversible. Reverting it would require slashing at least 1/3 of total stake (~10M ETH at current valuation).
The two layers serve different needs. LMD GHOST resolves single-block forks fast (within a slot or two) but offers only probabilistic safety. Casper FFG provides absolute safety but only every two epochs. Together they give "fast head, slow finality" — wallets can show pending blocks immediately, exchanges wait ~12 minutes for finality before crediting deposits.
Validator Economics and Slashing
A validator is created by depositing exactly 32 ETH to the deposit contract. The validator then has three duties:
- Propose — produce a block when randomly selected (~once every few weeks per validator). Get the priority fees + MEV.
- Attest — vote on the head every epoch. Get a small reward for correct, timely votes.
- Sync committee — every 256 epochs, ~512 validators are selected to sign sync messages used by light clients. Extra reward for participation.
Total APR sits around 3-5% for a solo validator, depending on total staked ETH. (The issuance curve targets a roughly logarithmic decline as more validators join.)
Slashing punishes equivocation — provably cheating the protocol. Two slashable offenses:
- Double proposal — signing two different blocks at the same slot.
- Surround / double attestation — casting two attestations whose vote ranges overlap in a way only possible if you're trying to fork the chain.
A slashed validator immediately loses 1/32 of its balance (1 ETH initial penalty), is forcibly exited over the next ~36 days, and loses an additional correlation penalty proportional to how many other validators were slashed in the same window. A coordinated attack that slashes all attackers can wipe out their entire stake. Inactivity leak is the other major economic disincentive: if finality stalls for 4+ epochs, validators that didn't attest get their balance slowly drained until the active set drops back below 2/3 and finality resumes.
Execution Layer / Consensus Layer Split
Pre-Merge, a single client (geth, parity) did everything: networking, mempool, EVM execution, PoW mining, fork choice. Post-Merge, the two halves of an Ethereum node are separate processes communicating over the Engine API (a JSON-RPC):
- Execution Layer (EL) clients — geth, nethermind, besu, erigon, reth. Handle the EVM, the state trie, the transaction pool, and the eth_* JSON-RPC API.
- Consensus Layer (CL) clients — prysm, lighthouse, teku, nimbus, lodestar. Handle the beacon chain, fork choice, attestation aggregation, validator duties, and slashing detection.
The Engine API has three core methods: engine_newPayload (the CL hands a
block to the EL for execution validation), engine_forkchoiceUpdated (the
CL tells the EL which head and finalized block to track, optionally requesting payload
build), and engine_getPayload (when this validator is the proposer, the
CL fetches the latest built block from the EL to broadcast).
The split has two big benefits: client diversity (no single bug can take down the network if you mix-and-match EL+CL pairs) and cleaner concerns (consensus changes don't churn execution code and vice versa). It also enabled MEV-Boost: an out-of-protocol marketplace where third-party builders construct blocks (often optimized for MEV extraction), bid for the right to have their block proposed, and the validator just signs the winning bid. ~90% of mainnet blocks are MEV-Boost blocks today.
Blob Transactions: EIP-4844 / Proto-Danksharding
Layer-2 rollups (Optimism, Arbitrum, Base, zkSync) execute transactions off-chain and
post compressed transaction data back to L1 for data availability. Pre-4844, that data
sat in calldata, costing ~16 gas per byte and consuming a meaningful
fraction of every block's gas. Rollup fees were dominated by L1 calldata costs.
EIP-4844 (Dencun, 2024) introduced a new transaction type: blob-carrying
transactions. A blob is a 128 KiB chunk of opaque data attached to a transaction
but not stored in the EVM state. The EVM can only see a KZG commitment to each
blob (via the BLOBHASH opcode and the POINT_EVALUATION
precompile). The blobs themselves are propagated through the consensus layer's gossip
network and pruned after ~18 days.
Three properties:
- Cheap — blob gas has its own EIP-1559-style market separate from execution gas. Blob fees during normal load are ~10-100x cheaper than equivalent calldata. Rollup transaction fees dropped 90%+ overnight.
- Verifiable but ephemeral — the rollup's contract on L1 stores the blob hash, can verify proofs against it, but the blob bytes themselves don't burden state.
- Forward-compatible — KZG commitments are the building block for full Danksharding (the long-term plan to split blob storage across the validator set, scaling data availability ~10x further).
Light Clients and State Sync
A full node holds the entire state trie (~200 GB and growing); a light client wants
consensus security without the disk burden. The post-Merge sync committee (512
validators rotating every 256 epochs, signing every slot's header) gives light clients
a compact, BLS-aggregated proof of which beacon block is canonical. Combined with the
MPT proof for any specific account or storage slot, a light client can verify
getBalance or eth_call queries against state roots without
executing transactions itself.
For full-node sync, modern clients use snap sync: download the latest finalized state directly from peers as flat key-value pairs, verify each chunk against the published state root, then process only the most recent ~64 blocks of history. A fresh node can sync in hours instead of weeks, at the cost of not having full historical receipts (which can be filled in via separate "history" sync). EIP-4444 (eventually) makes pruning history default, shrinking the per-node footprint dramatically.
Ethereum vs Other L1s
| Ethereum | Solana | Bitcoin | Cosmos (per chain) | |
|---|---|---|---|---|
| State model | Account-based, MPT | Account-based, flat | UTXO | Account-based, IAVL+ tree |
| Block time | 12s | ~400ms | ~10 min | ~6s (Tendermint) |
| Finality | ~12.8 min (Casper FFG) | ~12s (TowerBFT) | Probabilistic, ~60 min | Instant (Tendermint BFT) |
| Consensus | PoS · Casper FFG + LMD GHOST | PoS · TowerBFT + PoH | PoW · longest chain | PoS · Tendermint BFT |
| Execution model | EVM (256-bit stack) | BPF (Sealevel parallel) | Stack-based Script (limited) | App-specific (Cosmos SDK) |
| Validator set | ~1M (permissionless, low-stake) | ~1500 (permissionless, high-stake) | Open mining | ~150 per chain (top stakeholders) |
| Throughput target | ~15-30 TPS L1, scaled by L2 | ~3000-5000 TPS | ~7 TPS | varies; ~1000 TPS Tendermint |
| Smart contracts | Solidity, Vyper, Yul | Rust, C, Solana SDK | Limited | CosmWasm (Rust), Go modules |
Tradeoffs and Honest Weaknesses
- L1 throughput is low — 15-30 TPS on a chain that targets every laptop being able to verify. The strategy is to push throughput onto rollups (L2) and use L1 purely as a settlement and data-availability layer.
- State growth is unbounded — every contract that's ever been deployed is part of the state forever (no automatic pruning). The state trie is ~200 GB and grows roughly linearly. Verkle Tries + statelessness are the medium-term fix; in the meantime, large nodes need fast SSDs and lots of RAM.
- MEV centralization risk — most blocks are now built by a handful of professional builders (Beaverbuild, Flashbots, Titan). Validators rarely build their own. If those builders coordinate or are censored, the chain's neutrality erodes. The "PBS" (Proposer-Builder Separation) roadmap aims to enshrine this in-protocol with stronger guarantees.
- Validator economics favor pools — running a solo validator requires 32 ETH (~$80-100k), uptime, slashing risk, and ops effort. Most retail stakers go through Lido, Coinbase, RocketPool, or Kraken. Lido alone holds ~30% of staked ETH, raising centralization concerns.
- Reorg risk on the head — single-slot reorgs happen routinely (a few times per day) when proposers see the previous block too late. Multi-slot reorgs are rare but possible during high latency. Wallets and exchanges wait for finality, not just inclusion, before considering transactions safe.
- Quantum risk on signatures — secp256k1 ECDSA falls to a sufficiently large quantum computer. There is a long-term plan to migrate to STARK-based or hash-based signatures, but it's a ~5-10 year horizon and would require migrating every existing account.
- Account abstraction is half-finished — EIP-4337 added smart-contract-account flows, EIP-7702 lets EOAs delegate to code, but the seed-phrase UX problem isn't fully solved at protocol level.
Frequently Asked Questions
How is finality different from confirmation?
What happens to a transaction with insufficient gas?
Why does eth_call cost no gas?
What's the difference between value and gas in a transaction?
value is ETH transferred from the sender to the recipient as part of the call. gas is the maximum compute the transaction is willing to consume, paid for separately by the sender at gasPrice (or maxFeePerGas post-1559). Value moves between parties; gas is destroyed (base fee burn) or paid to the proposer (priority fee). A transaction that sends 0 ETH (a pure contract call) still costs gas.How does CREATE2 produce a deterministic contract address?
keccak256(0xff, deployer, salt, keccak256(initCode))[12:]. Because all four inputs are known before deployment, anyone can compute the future address. This is the foundation for "counterfactual" contract patterns: a wallet can sign a transaction sending ETH to an address that doesn't exist yet, knowing the contract will be deployed there later by the deployer (or anyone who knows the salt).