ClickHouse vs Elasticsearch

Elasticsearch is a search engine; ClickHouse is an analytics engine. They overlap in one big territory — log and event data — and that's where the comparison matters. Elasticsearch indexes every term in every document and gives you fast full-text search and faceted filtering. ClickHouse stores rows in column order and gives you scans, aggregations, and SQL at 100× the throughput. Pick the wrong one for your workload and you'll either burn money on disk or wait minutes for a "count by service over the last 24h" query.

Different storage shapes

ES inverts: term → documents. ClickHouse columnar: scan a column with codec compression. Different physics; different operations are cheap.

Side by side

	ClickHouse	Elasticsearch
Storage layout	Columnar parts	Lucene inverted index + doc values
Compression	3–30× typical	~1.5–3× typical
Full-text search	LIKE / ngrambf / tokenbf	Native, ranked
Aggregation	Vectorized, row-billion/sec	doc_values, slower
Schema	Strict types, ALTER expensive	Dynamic mapping
Query language	SQL	Query DSL (JSON) or ES SQL
Write throughput	100s MB/s/node	10s MB/s/node
Resource use	Low memory, high CPU	High RAM (heap + filesystem cache)
Update model	Mutations / ReplacingMergeTree	Document version, lazy delete

Workloads where ES wins

Free-text search with ranking — relevance scoring, BM25, multi-language analysis. ClickHouse can match on substrings but cannot rank.
Faceted search UIs with many short-tailed filters and aggregations interleaved. The inverted index makes "filter by term + aggregate" tight.
High-cardinality keyword fields with WHERE = 'foo' lookups across billions of docs.

Workloads where ClickHouse wins

Aggregations over time ranges — "errors per service per minute" — 10–100× faster, far cheaper.
Long-retention log storage — codec compression cuts disk by an order of magnitude.
Analytical SQL — joins, window functions, CTEs.
High write throughput at modest hardware — no per-document indexing cost.

The substring problem

ClickHouse's solution to "log message contains 'OOMKilled'" is an ngrambf_v1 data-skipping index:

ALTER TABLE logs
ADD INDEX msg_ngram message TYPE ngrambf_v1(3, 256, 2, 0) GRANULARITY 4;

SELECT count() FROM logs
WHERE message LIKE '%OOMKilled%' AND ts > now() - INTERVAL 1 DAY;

Granules whose bloom rejects the trigrams are skipped. It's not as fast as Lucene's posting list, but it's fast enough for most log workloads and the storage cost is negligible. For "find me docs about kernel panics ranked by relevance," it's not the right tool.

Schema and ingestion

ES has dynamic mapping: send a JSON document, ES infers types and indexes everything. Convenient until field explosion (dynamic mapping creates a field per unique key — a leaky log payload can produce thousands).

ClickHouse has strict schemas. The JSON type (24+) and Dynamic type (also 24+) approximate ES's dynamism but with explicit constraints. The bigger story is LowCardinality + ZSTD: a typical structured log table compresses to under 0.1× raw bytes.

CREATE TABLE logs (
    ts        DateTime CODEC(DoubleDelta, ZSTD),
    level     LowCardinality(String),
    service   LowCardinality(String),
    trace_id  UUID,
    message   String CODEC(ZSTD(3)),
    fields    Map(LowCardinality(String), String) CODEC(ZSTD)
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(ts)
ORDER BY (service, ts);

Cost

At 1 TB/day of structured logs over a 30-day retention:

Elasticsearch: ~30 TB raw, ~60–90 TB on disk after replication; needs RAM ~5% of disk for hot data; typically 5–20× compute vs ClickHouse.
ClickHouse: ~3 TB on disk after compression; replicas double that; RAM is whatever fits the working set.

Most teams that switch from ES to ClickHouse for logs report 5–20× cost reduction. The penalty: full-text search degrades to substring + ngram skipping.

Operational model

Elasticsearch shards live on data nodes, master-eligible nodes coordinate metadata, and Coordinating nodes fan out queries. Heap pressure is the perpetual operational concern: index merges, fielddata, query buffers all compete for JVM heap.

ClickHouse runs natively without a JVM. Memory is straightforwardly allocated from the page cache + a few per-query budgets (max_memory_usage, max_bytes_before_external_group_by). Replication is peer-to-peer through Keeper; there is no "master" to fail.

For high-cardinality keyword filtering at very low latency (e.g. "find the row with this exact UUID"), Elasticsearch's posting list is hard to beat. ClickHouse's primary-key + bloom filter combo gets close on well-designed schemas but cannot match Lucene's per-term lookup constants.

Tradeoffs

+ ClickHouse: faster aggregation, lower disk, full SQL, predictable resource use.
+ ES: full-text search with ranking, dynamic schema, mature ecosystem (Kibana, Logstash).
− ClickHouse: substring search via ngram is good, not great.
− ES: high cost at scale, heap pressure, slow aggregations.

Migrating ES log workloads to ClickHouse

The most common ES → ClickHouse migration is a logging stack. The recipe:

Replace the @timestamp field with a DateTime64(3).
Map structured fields to typed columns; put high-cardinality string fields under LowCardinality.
Stash the unstructured tail in a Map(LowCardinality(String), String) or the new JSON type.
Add an ngrambf_v1 index on message for substring search.
Replace Logstash with the Vector ClickHouse sink, or Kafka → CH Kafka engine.
Replace Kibana with Grafana, OpenObserve, or SigNoz.

Expect a 5–20× cost reduction for log-only workloads, and a quality tradeoff in full-text search ergonomics.

FAQ

Can ClickHouse fully replace Elasticsearch?

For analytical/log workloads, yes — and many teams have done so. For search UIs that need relevance ranking and language analysis, no.

What about hybrid: ES for search + CH for aggs?

Common pattern. Index the searchable fields in ES with short retention, ship the same data to ClickHouse for analytics with long retention.

Does ClickHouse have something like Kibana?

Grafana works well. Tabix and play.clickhouse.com are SQL-first. For log UI specifically, OpenObserve and SigNoz target the ES-replacement use case.

How does the OpenSearch / Elasticsearch fork affect this?

Both forks share the underlying Lucene model; the comparison is identical. License differences matter for vendor choice, not architecture.

What about full-text search in ClickHouse 24+?

The new full_text index (experimental) brings tokenized search closer to ES territory but ranking and language analysis are still primitive.