ClickHouse vs BigQuery

BigQuery is Google's serverless data warehouse: you write SQL, Google figures out the cluster. ClickHouse is a self-hosted columnar OLAP engine: you provision the boxes and tune the merges. Both are extremely fast on analytical scans, but they live in different operational universes — pricing, latency, real-time ingestion, and ecosystem are all opposites. The choice is rarely about raw speed; it is about which kind of operational debt your team would rather take on.

Architecture at a glance

ClickHouse couples compute to storage; BigQuery uses Google's Jupiter petabit network to keep them apart.

Side by side

	ClickHouse	BigQuery
Hosting	Self-hosted or ClickHouse Cloud	Google-managed only
Storage	Local disk / S3 (24+ separation)	Colossus, separated by design
Compute	Long-lived servers	Slots leased per query
Pricing	$/instance-hour	$/TB scanned (or $/slot-hour)
P99 latency	10ms–1s for hot data	1–10s baseline
Real-time INSERT	10s–100s of MB/s/node	Streaming insert API, $/MB
Joins	Hash / merge / direct	Broadcast / shuffle
UDFs	SQL, executable, WebAssembly	JavaScript, SQL
Ecosystem	Open source, Apache 2	GCP-native

Pricing model: where the surprise lives

BigQuery's on-demand pricing is per terabyte scanned. A query that touches 10 TB costs $50, regardless of how long it took. The on-demand model rewards narrow queries (column selection, partition predicates) and punishes SELECT *. The "flat-rate" alternative (slots) charges per compute capacity reservation — predictable, but you pay even when idle.

ClickHouse pricing is just the box. A 3-node 32-core cluster on AWS runs ~$3k/month and serves whatever you throw at it. There is no per-query cost. This is amazing if you have steady high-volume analytics; expensive if you query rarely.

-- BigQuery: this is a $50 query if events is 10 TB, even with WHERE
SELECT user_id, count(*) FROM events GROUP BY user_id;

-- BigQuery: this is a $5 query (only events of 2024-03 scanned)
SELECT user_id, count(*) FROM events
WHERE _PARTITIONDATE BETWEEN '2024-03-01' AND '2024-03-31'
GROUP BY user_id;

Query latency

ClickHouse reaches sub-100ms for primary-key-aligned queries on small data. Cold large-aggregation queries depend on disk and CPU; 10s for a 100-billion-row scan is normal.

BigQuery's baseline overhead is 1–3s before any work happens (slot acquisition + planning). Queries scale almost arbitrarily — Google can throw 2000 slots at one query — so very large aggregations finish faster than on a finite ClickHouse cluster, but interactive dashboards on small data feel sluggish.

Real-time ingestion

ClickHouse ingests 100s of MB/s per node by design. INSERTs land as parts in seconds, queryable immediately. Materialized views fan out to pre-aggregated tables in real time.

BigQuery streaming insert costs ~$0.01/MB with a per-row size cap and an eventual consistency window before rows appear in queries. The newer Storage Write API is cheaper but still bandwidth-billed. Most BigQuery users batch load via GCS for cost reasons, which means "real-time" becomes "every 5 minutes."

Ecosystem and lock-in

BigQuery integrates with the rest of GCP: Dataflow, Pub/Sub, Looker, IAM. Moving data out is a known ETL job; moving the workload off BigQuery means rewriting GoogleSQL → standard SQL and re-engineering streaming ingestion.

ClickHouse is open source. The same engine runs on a laptop, on EC2, on bare metal, on Kubernetes, on ClickHouse Cloud. Dialects are stable across deployments. The cost of leaving is "spin up the same engine somewhere else."

When each wins

Pick ClickHouse for: real-time event ingestion, sub-second dashboards, predictable monthly cost, on-prem or multi-cloud, log/metrics/observability backends.
Pick BigQuery for: bursty ad-hoc analytics over petabytes, organizations already deep in GCP, ML pipelines via BigQuery ML, audit-grade SQL with no ops staff.

Concurrency & dashboards

ClickHouse is built to serve thousands of concurrent dashboard queries off the same nodes. The thread-per-query plus shared filesystem cache makes each marginal user nearly free until you saturate CPU.

BigQuery's slot model means every concurrent query takes some of your slot budget. On-demand pricing has no fixed concurrency limit but each query competes for shared Google capacity, producing variable latency. Flat-rate slot reservations give predictable concurrency at the cost of paying for idle slots.

The classic pattern: BigQuery for the data lake / nightly aggregations, ClickHouse downstream for the user-facing dashboards. ETL into BQ once, replicate aggregated tiers into CH for serving.

Tradeoffs

+ ClickHouse: lower latency, real-time inserts, predictable cost.
+ BigQuery: zero ops, infinite burst capacity, Google ML and IAM stack.
− ClickHouse: you operate Keeper, replicas, mergers, retention.
− BigQuery: per-TB scanning surprises, GCP lock-in, slow dashboards.

Migration patterns

Most BigQuery → ClickHouse migrations fall into one of three shapes:

Lift-and-shift. Recreate schemas in ClickHouse, dump as Parquet, ingest via the s3() table function. Hard parts: GoogleSQL-specific syntax (STRUCT, ARRAY_AGG) and BigQuery UDFs (JavaScript) that need rewriting in ClickHouse SQL.
Tiered. Keep BigQuery as the source of truth for ad-hoc/ETL; replicate hot recent data into ClickHouse for sub-second user-facing queries. Common when BigQuery is the canonical warehouse and CH is the serving layer.
Greenfield. Write new pipelines directly to ClickHouse (Kafka engine, native HTTP), keeping BigQuery only for whatever specific GCP integrations require it.

FAQ

Can I use BigQuery as a backend for a low-latency app?

Not directly. The 1–3s baseline kills interactive UX. Use BI Engine (in-memory cache) or replicate into a faster store. ClickHouse is engineered for this directly.

Does ClickHouse support compute-storage separation?

Yes since 24.x — S3 disk + zero-copy replication. Not as automatic as BigQuery slots, but it eliminates the "must scale storage with compute" problem.

Which has better SQL?

BigQuery's GoogleSQL is closer to ANSI; ClickHouse has dialect quirks but more analytical functions and the entire *State / *Merge aggregate-state model.

What about cost at petabyte scale?

For steady workloads, ClickHouse on dedicated hardware is typically 3–10× cheaper than BigQuery flat-rate. For sporadic queries, BigQuery on-demand often wins.

How do I migrate?

Most schemas port directly. The hard parts: GoogleSQL specifics (STRUCT, ARRAY_AGG), streaming ingestion rewrite, and BigQuery-specific UDFs.