What an ALTER UPDATE actually does

ALTER UPDATE state = 'X' WHERE id = 42; part 1 part 2 part 3 part 4 ↓ background mutation worker rewrites every part touching id=42 part 1' (rewritten) part 2 (skipped) part 3' (rewritten) part 4 (skipped) A part containing the WHERE row is fully decompressed, rewritten, recompressed. Parts whose minmax index proves no match are skipped entirely.

A mutation that changes one row in one part rewrites a whole 100 GB part. Use partition-key WHERE clauses to limit the blast radius.

Key Numbers

async
default mutation submission mode
whole
part is rewritten — even for 1 row
_row_exists
virtual mask for lightweight DELETE
2
mutation kinds: ALTER UPDATE / DELETE
0–2
values for mutations_sync
disk write ≈ size of touched parts
part-id
granularity of mutation skipping

ALTER UPDATE / ALTER DELETE

Submitted to a queue, applied asynchronously by background worker threads. The query returns immediately.

ALTER TABLE events
UPDATE state = 'archived'
WHERE ts < '2024-01-01';

ALTER TABLE events
DELETE WHERE user_id = 12345;

To wait synchronously:

SET mutations_sync = 1;  -- wait for current replica
SET mutations_sync = 2;  -- wait for ALL replicas
ALTER TABLE events DELETE WHERE …;

Lightweight DELETE FROM

Introduced in 22.x; the standard SQL form. Instead of rewriting parts, it writes a _row_exists mask. Reads filter by the mask. Background merges drop masked rows when they next rewrite the part anyway.

DELETE FROM events WHERE user_id = 12345;
ALTER DELETEDELETE FROM
StyleMutation (rewrite)Mask (lightweight)
SubmissionAsync by defaultSync, reflected in next read
Disk writeWhole partsMask file only
Reads afterSee deleted goneSee _row_exists=0 filtered
Best forBulk archivalTargeted GDPR-style

system.mutations

SELECT
    database,
    table,
    mutation_id,
    command,
    create_time,
    is_done,
    parts_to_do,
    latest_failed_part,
    latest_fail_reason
FROM system.mutations
WHERE NOT is_done
ORDER BY create_time DESC;

Cancel a stuck mutation with KILL MUTATION:

KILL MUTATION WHERE database = 'default' AND mutation_id = '0000000123';

Why mutations are expensive

A mutation is implemented by enumerating every part of the table, evaluating the WHERE on each part's primary index, and rewriting any part that could contain a matching row. That means:

  • One row's UPDATE may rewrite a 100 GB part.
  • Replicated mutations run on every replica independently — N× the IO.
  • Mutations bypass the merge scheduler's part-size budgets, so they can stall regular merges.
  • If the WHERE includes the partition key, only matching partitions are touched. Always include partition predicates.
-- Good: only one partition is rewritten
ALTER TABLE events
UPDATE state = 'archived'
WHERE toYYYYMM(ts) = 202401 AND state = 'pending';

-- Bad: every partition is scanned
ALTER TABLE events
UPDATE state = 'archived'
WHERE state = 'pending';

The right way: mutation-by-INSERT

For high-rate "updates" use a MergeTree variant that resolves duplicates during merging:

  • ReplacingMergeTree(version) — last-write-wins by sorting key.
  • CollapsingMergeTree(sign) — +1/−1 rows cancel during merges.
  • VersionedCollapsingMergeTree(sign, version) — same, but tolerates out-of-order inserts.
-- Conceptual "UPDATE" via INSERT
INSERT INTO users_replacing
VALUES (42, 'new_name', now());  -- version > previous → wins on merge

-- Read-time dedup if not yet merged:
SELECT * FROM users_replacing FINAL WHERE id = 42;

These engines push the cost from "mutation time" (rewrite parts) to "merge time" (already happening) and "read time" (FINAL, optional). For workloads with many small updates this is orders of magnitude cheaper.

Tuning knobs

SettingDefaultEffect
mutations_sync00=async, 1=wait for this replica, 2=wait for all
background_pool_size16Worker threads for merges + mutations
number_of_free_entries_in_pool_to_execute_mutation10Reserves merge capacity
max_part_loading_threadsautoParallelism when scanning parts to mutate
lightweight_deletes_sync2Wait policy for DELETE FROM
apply_mutations_on_fly0If 1, queries see mutation effects before parts rewritten

apply_mutations_on_fly = 1 (24+) is the modern story for "I want my UPDATE visible immediately." It applies the mutation expression at read time until the background rewrite catches up. Good for low-rate mutations; expensive for high-rate ones because every read pays the filter cost.

Mutation IO budget

Mutations and merges share the background_pool_size thread pool. A long mutation can starve regular merges, leading to "too many parts" errors. Production rules:

  • Run heavy mutations during off-peak hours.
  • Always include partition predicates to bound work.
  • Watch system.merge_tree_settings.background_pool_size and increase if mutations + merges saturate.
  • Prefer lightweight DELETE for "delete N rows now"; reserve ALTER UPDATE for bulk archival.

Tradeoffs

  • + Standard SQL semantics; familiar to PostgreSQL/MySQL users.
  • + Lightweight DELETE is fast and visible immediately.
  • + KILL MUTATION lets you back out if you set the WHERE wrong.
  • ALTER UPDATE/DELETE are not a transactional row update.
  • One bad mutation can rewrite the entire table.
  • Replication multiplies the IO cost by N replicas.
  • Mutations queue serially per table; back-pressure can be invisible.

FAQ

How do I tell if a mutation is "almost done"?

parts_to_do in system.mutations. It decrements as parts are rewritten. If it stalls, check latest_fail_reason.

Can I run two mutations in parallel?

The mutation queue per table is serial. If you submit two, they run in order. Mutations on different tables run in parallel up to the worker pool size.

What is _row_exists?

A virtual UInt8 column added by lightweight DELETE. WHERE _row_exists filters out logically-deleted rows automatically; you rarely need to reference it directly.

Will a SELECT see the result of an ALTER UPDATE immediately?

No. Mutations are async by default; reads still see old data until the relevant part is rewritten. Use SET mutations_sync=1 for "wait for me" semantics, or use lightweight DELETE/ReplacingMergeTree.

Should I run ALTER UPDATE during ingestion?

Avoid it. Mutations compete with merges for IO; running them during peak ingestion exacerbates the "too many parts" risk. Schedule for off-peak or use ReplacingMergeTree.