Mutations in ClickHouse
ClickHouse is an immutable column store, so "UPDATE" and "DELETE" don't mutate rows in place — they rewrite
entire parts. ALTER TABLE … UPDATE is async, expensive, and observable through system.mutations.
DELETE FROM (the "lightweight" variant) marks rows with a hidden _row_exists mask and lets compaction
clean up later. Both are valid for occasional GDPR-style operations and dangerous as a regular workload. The
right tool for "mutable" data is usually ReplacingMergeTree or CollapsingMergeTree, where mutations
are encoded as new INSERTs and resolved by the merge process.
What an ALTER UPDATE actually does
A mutation that changes one row in one part rewrites a whole 100 GB part. Use partition-key WHERE clauses to limit the blast radius.
Key Numbers
mutations_syncALTER UPDATE / ALTER DELETE
Submitted to a queue, applied asynchronously by background worker threads. The query returns immediately.
ALTER TABLE events
UPDATE state = 'archived'
WHERE ts < '2024-01-01';
ALTER TABLE events
DELETE WHERE user_id = 12345; To wait synchronously:
SET mutations_sync = 1; -- wait for current replica
SET mutations_sync = 2; -- wait for ALL replicas
ALTER TABLE events DELETE WHERE …; Lightweight DELETE FROM
Introduced in 22.x; the standard SQL form. Instead of rewriting parts, it writes a _row_exists mask. Reads
filter by the mask. Background merges drop masked rows when they next rewrite the part anyway.
DELETE FROM events WHERE user_id = 12345; | ALTER DELETE | DELETE FROM | |
|---|---|---|
| Style | Mutation (rewrite) | Mask (lightweight) |
| Submission | Async by default | Sync, reflected in next read |
| Disk write | Whole parts | Mask file only |
| Reads after | See deleted gone | See _row_exists=0 filtered |
| Best for | Bulk archival | Targeted GDPR-style |
system.mutations
SELECT
database,
table,
mutation_id,
command,
create_time,
is_done,
parts_to_do,
latest_failed_part,
latest_fail_reason
FROM system.mutations
WHERE NOT is_done
ORDER BY create_time DESC; Cancel a stuck mutation with KILL MUTATION:
KILL MUTATION WHERE database = 'default' AND mutation_id = '0000000123'; Why mutations are expensive
A mutation is implemented by enumerating every part of the table, evaluating the WHERE on each part's primary index, and rewriting any part that could contain a matching row. That means:
- One row's UPDATE may rewrite a 100 GB part.
- Replicated mutations run on every replica independently — N× the IO.
- Mutations bypass the merge scheduler's part-size budgets, so they can stall regular merges.
- If the WHERE includes the partition key, only matching partitions are touched. Always include partition predicates.
-- Good: only one partition is rewritten
ALTER TABLE events
UPDATE state = 'archived'
WHERE toYYYYMM(ts) = 202401 AND state = 'pending';
-- Bad: every partition is scanned
ALTER TABLE events
UPDATE state = 'archived'
WHERE state = 'pending'; The right way: mutation-by-INSERT
For high-rate "updates" use a MergeTree variant that resolves duplicates during merging:
ReplacingMergeTree(version)— last-write-wins by sorting key.CollapsingMergeTree(sign)— +1/−1 rows cancel during merges.VersionedCollapsingMergeTree(sign, version)— same, but tolerates out-of-order inserts.
-- Conceptual "UPDATE" via INSERT
INSERT INTO users_replacing
VALUES (42, 'new_name', now()); -- version > previous → wins on merge
-- Read-time dedup if not yet merged:
SELECT * FROM users_replacing FINAL WHERE id = 42; These engines push the cost from "mutation time" (rewrite parts) to "merge time" (already happening) and "read time" (FINAL, optional). For workloads with many small updates this is orders of magnitude cheaper.
Tuning knobs
| Setting | Default | Effect |
|---|---|---|
mutations_sync | 0 | 0=async, 1=wait for this replica, 2=wait for all |
background_pool_size | 16 | Worker threads for merges + mutations |
number_of_free_entries_in_pool_to_execute_mutation | 10 | Reserves merge capacity |
max_part_loading_threads | auto | Parallelism when scanning parts to mutate |
lightweight_deletes_sync | 2 | Wait policy for DELETE FROM |
apply_mutations_on_fly | 0 | If 1, queries see mutation effects before parts rewritten |
apply_mutations_on_fly = 1 (24+) is the modern story for "I want my UPDATE visible immediately." It
applies the mutation expression at read time until the background rewrite catches up. Good for low-rate
mutations; expensive for high-rate ones because every read pays the filter cost.
Mutation IO budget
Mutations and merges share the background_pool_size thread pool. A long mutation can starve regular
merges, leading to "too many parts" errors. Production rules:
- Run heavy mutations during off-peak hours.
- Always include partition predicates to bound work.
- Watch
system.merge_tree_settings.background_pool_sizeand increase if mutations + merges saturate. - Prefer lightweight DELETE for "delete N rows now"; reserve ALTER UPDATE for bulk archival.
Tradeoffs
- + Standard SQL semantics; familiar to PostgreSQL/MySQL users.
- + Lightweight DELETE is fast and visible immediately.
- + KILL MUTATION lets you back out if you set the WHERE wrong.
- − ALTER UPDATE/DELETE are not a transactional row update.
- − One bad mutation can rewrite the entire table.
- − Replication multiplies the IO cost by N replicas.
- − Mutations queue serially per table; back-pressure can be invisible.
FAQ
How do I tell if a mutation is "almost done"?
parts_to_do in system.mutations. It decrements as parts are rewritten. If it stalls, check latest_fail_reason.
Can I run two mutations in parallel?
The mutation queue per table is serial. If you submit two, they run in order. Mutations on different tables run in parallel up to the worker pool size.
What is _row_exists?
A virtual UInt8 column added by lightweight DELETE. WHERE _row_exists filters out logically-deleted rows automatically; you rarely need to reference it directly.
Will a SELECT see the result of an ALTER UPDATE immediately?
No. Mutations are async by default; reads still see old data until the relevant part is rewritten. Use SET mutations_sync=1 for "wait for me" semantics, or use lightweight DELETE/ReplacingMergeTree.
Should I run ALTER UPDATE during ingestion?
Avoid it. Mutations compete with merges for IO; running them during peak ingestion exacerbates the "too many parts" risk. Schedule for off-peak or use ReplacingMergeTree.