Materialized Views in ClickHouse
ClickHouse materialized views are INSERT triggers in disguise. They are not refreshed; they are
fed. Every time a block lands in the source table, the view's SELECT runs against just that block, and
the result is written into a target table — usually a MergeTree-family engine. Pair an MV with
AggregatingMergeTree + sumState/uniqState and you get continuous, real-time rollups that compress
billions of events into tiny pre-aggregated tables. Since 24.x, the new refreshable MV finally adds the
scheduled-recompute model that other databases call "materialized view".
Insert-triggered flow
The MV fires once per inserted block and only sees that block — never the whole source table. This is what makes it cheap and what makes the SELECT subject to surprising rules.
Key Numbers
The classic INSERT-triggered MV
CREATE TABLE events (
ts DateTime,
user_id UInt64,
event LowCardinality(String),
revenue Decimal(18, 4)
)
ENGINE = MergeTree
ORDER BY (event, ts);
CREATE TABLE events_5m_agg (
window DateTime,
event LowCardinality(String),
hits AggregateFunction(sum, UInt64),
uniq_users AggregateFunction(uniq, UInt64),
revenue AggregateFunction(sum, Decimal(18, 4))
)
ENGINE = AggregatingMergeTree
ORDER BY (window, event);
CREATE MATERIALIZED VIEW events_5m_mv TO events_5m_agg AS
SELECT
toStartOfFiveMinutes(ts) AS window,
event,
sumState(toUInt64(1)) AS hits,
uniqState(user_id) AS uniq_users,
sumState(revenue) AS revenue
FROM events
GROUP BY window, event;
The view writes partial aggregate states (the binary intermediate form) into events_5m_agg.
Background merges combine states for the same (window, event). Reads call the matching *Merge functions:
SELECT
window,
event,
sumMerge(hits) AS hits,
uniqMerge(uniq_users) AS users,
sumMerge(revenue) AS revenue
FROM events_5m_agg
WHERE window >= now() - INTERVAL 1 DAY
GROUP BY window, event
ORDER BY window; Source-table semantics — read this carefully
The FROM events in the MV is a lie. The view never re-reads events; it only ever sees the block being
inserted. That has three consequences:
- JOINs in the MV's SELECT see only the inserted block on the left side. The right side is read normally — every block.
- Window functions and ORDER BY are scoped to the block, so global ordering is impossible inside an MV.
- Inserts that bypass the source table (e.g.
INSERT INTO events_5m_aggdirectly) skip the MV entirely.
This is also why MVs are cheap: the SELECT is incremental. A 10M-row INSERT triggers an MV SELECT over 10M rows, not over the entire source.
Backfilling with ATTACH MATERIALIZED VIEW
Creating the MV does not populate it from existing rows. Two options:
-- Option A: POPULATE (atomic, but blocks all writes during creation)
CREATE MATERIALIZED VIEW events_5m_mv TO events_5m_agg
POPULATE AS SELECT … FROM events GROUP BY …;
-- Option B (preferred): create empty, then backfill in batches
CREATE MATERIALIZED VIEW events_5m_mv TO events_5m_agg
AS SELECT … FROM events GROUP BY …;
INSERT INTO events_5m_agg
SELECT
toStartOfFiveMinutes(ts), event,
sumState(toUInt64(1)), uniqState(user_id), sumState(revenue)
FROM events
WHERE ts BETWEEN '2024-01-01' AND '2024-01-08'
GROUP BY 1, 2; POPULATE seems convenient but inserts during the populate run are silently dropped from the MV. For
production tables, always backfill manually so you control the cutover.
Refreshable materialized views (24+)
For workloads where INSERT-triggered semantics don't fit (cross-source joins, dimension-table refresh, "give me the top 100 users every 10 minutes"), 24.x added a scheduler:
CREATE MATERIALIZED VIEW top_users_mv
REFRESH EVERY 10 MINUTE
ENGINE = MergeTree ORDER BY user_id
AS
SELECT user_id, sum(revenue) AS revenue
FROM events
WHERE ts >= now() - INTERVAL 1 DAY
GROUP BY user_id
ORDER BY revenue DESC LIMIT 100;
Refreshable MVs replace the table contents atomically (rename the new partition into place). Behavior is closer
to PostgreSQL's REFRESH MATERIALIZED VIEW — at the cost of latency between refreshes.
SELECT view, status, last_refresh_time, next_refresh_time
FROM system.view_refreshes; Inspecting MVs
-- Show all MVs and their target tables
SELECT name, engine, as_select
FROM system.tables
WHERE engine LIKE 'Materialized%';
-- Last MV runs (rows in / rows out / time)
SELECT view_name, view_duration_ms, read_rows, written_rows, exception
FROM system.query_views_log
WHERE event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC; Chained MVs & multi-target fan-out
A single source table can feed N materialized views. Each runs against the same INSERT block in parallel. This is how production ClickHouse deployments build a tiered rollup pyramid:
events (raw, retention 14 days)
├── events_5m_mv → events_5m_agg (retention 90 days)
├── events_1h_mv → events_1h_agg (retention 1 year)
└── events_1d_mv → events_1d_agg (retention 5 years)
-- Each tier reads from the previous via a SECOND chained MV:
CREATE MATERIALIZED VIEW events_1h_mv TO events_1h_agg AS
SELECT
toStartOfHour(window) AS window,
event,
sumMergeState(hits) AS hits,
uniqMergeState(uniq_users) AS uniq_users
FROM events_5m_agg
GROUP BY window, event; sumMergeState takes existing partial states and produces a new partial state — the trick that lets
you cascade aggregations without ever decomposing back to raw rows. Disk usage drops by the rollup ratio at
every tier.
Tuning knobs
| Setting | Default | Effect |
|---|---|---|
materialized_views_ignore_errors | 0 | If 1, MV errors don't fail the source INSERT |
parallel_view_processing | 0 | If 1, multiple MVs on same source run in parallel |
deduplicate_blocks_in_dependent_materialized_views | 0 | Apply INSERT dedup to MV writes too |
insert_deduplication_token | — | Force dedup key for cross-table idempotency |
min_insert_block_size_rows | 1048576 | Affects how big each MV-triggered SELECT block is |
Tradeoffs
- + Real-time rollups with no scheduled jobs.
- + AggregateFunction states merge across partitions for free.
- + Multiple MVs can fan out from a single INSERT.
- − SELECT semantics differ from a normal view (block scope).
- − Errors in the MV fail the INSERT into the source table.
- − No automatic backfill on schema changes; manual rebuild needed.
FAQ
If an MV's SELECT throws, does the INSERT fail?
Yes by default. Set materialized_views_ignore_errors=1 at the session level to skip a failing MV — at the cost of silent data loss in the rollup.
Can multiple MVs feed the same target?
Yes. Common pattern: one MV per upstream source, all writing into a single AggregatingMergeTree target.
Why is my count off by one block?
Because GROUP BY in the MV groups within a single insert block. If two inserts land for the same minute they create two state-rows; merges combine them later. Always read with sumMerge/uniqMerge.
Should I use POPULATE in production?
No. INSERTs during POPULATE are dropped from the MV. Create empty, backfill in batches, verify counts.
What's the difference between TO target and inline ENGINE?
TO target writes to a pre-existing table you control; the MV is just the trigger. Inline ENGINE = … creates a hidden .inner. table that's harder to inspect and harder to alter. Always use TO.