Architecture at a glance

ClickHouse SQL parser + planner vectorized executor MergeTree storage local SSD / EBS compute and storage co-located BigQuery Dremel query engine Borg / slot scheduler ↕ Jupiter network Capacitor (columnar) Colossus distributed FS compute and storage decoupled

ClickHouse couples compute to storage; BigQuery uses Google's Jupiter petabit network to keep them apart.

Side by side

ClickHouseBigQuery
HostingSelf-hosted or ClickHouse CloudGoogle-managed only
StorageLocal disk / S3 (24+ separation)Colossus, separated by design
ComputeLong-lived serversSlots leased per query
Pricing$/instance-hour$/TB scanned (or $/slot-hour)
P99 latency10ms–1s for hot data1–10s baseline
Real-time INSERT10s–100s of MB/s/nodeStreaming insert API, $/MB
JoinsHash / merge / directBroadcast / shuffle
UDFsSQL, executable, WebAssemblyJavaScript, SQL
EcosystemOpen source, Apache 2GCP-native

Pricing model: where the surprise lives

BigQuery's on-demand pricing is per terabyte scanned. A query that touches 10 TB costs $50, regardless of how long it took. The on-demand model rewards narrow queries (column selection, partition predicates) and punishes SELECT *. The "flat-rate" alternative (slots) charges per compute capacity reservation — predictable, but you pay even when idle.

ClickHouse pricing is just the box. A 3-node 32-core cluster on AWS runs ~$3k/month and serves whatever you throw at it. There is no per-query cost. This is amazing if you have steady high-volume analytics; expensive if you query rarely.

-- BigQuery: this is a $50 query if events is 10 TB, even with WHERE
SELECT user_id, count(*) FROM events GROUP BY user_id;

-- BigQuery: this is a $5 query (only events of 2024-03 scanned)
SELECT user_id, count(*) FROM events
WHERE _PARTITIONDATE BETWEEN '2024-03-01' AND '2024-03-31'
GROUP BY user_id;

Query latency

ClickHouse reaches sub-100ms for primary-key-aligned queries on small data. Cold large-aggregation queries depend on disk and CPU; 10s for a 100-billion-row scan is normal.

BigQuery's baseline overhead is 1–3s before any work happens (slot acquisition + planning). Queries scale almost arbitrarily — Google can throw 2000 slots at one query — so very large aggregations finish faster than on a finite ClickHouse cluster, but interactive dashboards on small data feel sluggish.

Real-time ingestion

ClickHouse ingests 100s of MB/s per node by design. INSERTs land as parts in seconds, queryable immediately. Materialized views fan out to pre-aggregated tables in real time.

BigQuery streaming insert costs ~$0.01/MB with a per-row size cap and an eventual consistency window before rows appear in queries. The newer Storage Write API is cheaper but still bandwidth-billed. Most BigQuery users batch load via GCS for cost reasons, which means "real-time" becomes "every 5 minutes."

Ecosystem and lock-in

BigQuery integrates with the rest of GCP: Dataflow, Pub/Sub, Looker, IAM. Moving data out is a known ETL job; moving the workload off BigQuery means rewriting GoogleSQL → standard SQL and re-engineering streaming ingestion.

ClickHouse is open source. The same engine runs on a laptop, on EC2, on bare metal, on Kubernetes, on ClickHouse Cloud. Dialects are stable across deployments. The cost of leaving is "spin up the same engine somewhere else."

When each wins

  • Pick ClickHouse for: real-time event ingestion, sub-second dashboards, predictable monthly cost, on-prem or multi-cloud, log/metrics/observability backends.
  • Pick BigQuery for: bursty ad-hoc analytics over petabytes, organizations already deep in GCP, ML pipelines via BigQuery ML, audit-grade SQL with no ops staff.

Concurrency & dashboards

ClickHouse is built to serve thousands of concurrent dashboard queries off the same nodes. The thread-per-query plus shared filesystem cache makes each marginal user nearly free until you saturate CPU.

BigQuery's slot model means every concurrent query takes some of your slot budget. On-demand pricing has no fixed concurrency limit but each query competes for shared Google capacity, producing variable latency. Flat-rate slot reservations give predictable concurrency at the cost of paying for idle slots.

The classic pattern: BigQuery for the data lake / nightly aggregations, ClickHouse downstream for the user-facing dashboards. ETL into BQ once, replicate aggregated tiers into CH for serving.

Tradeoffs

  • + ClickHouse: lower latency, real-time inserts, predictable cost.
  • + BigQuery: zero ops, infinite burst capacity, Google ML and IAM stack.
  • ClickHouse: you operate Keeper, replicas, mergers, retention.
  • BigQuery: per-TB scanning surprises, GCP lock-in, slow dashboards.

Migration patterns

Most BigQuery → ClickHouse migrations fall into one of three shapes:

  1. Lift-and-shift. Recreate schemas in ClickHouse, dump as Parquet, ingest via the s3() table function. Hard parts: GoogleSQL-specific syntax (STRUCT, ARRAY_AGG) and BigQuery UDFs (JavaScript) that need rewriting in ClickHouse SQL.
  2. Tiered. Keep BigQuery as the source of truth for ad-hoc/ETL; replicate hot recent data into ClickHouse for sub-second user-facing queries. Common when BigQuery is the canonical warehouse and CH is the serving layer.
  3. Greenfield. Write new pipelines directly to ClickHouse (Kafka engine, native HTTP), keeping BigQuery only for whatever specific GCP integrations require it.

FAQ

Can I use BigQuery as a backend for a low-latency app?

Not directly. The 1–3s baseline kills interactive UX. Use BI Engine (in-memory cache) or replicate into a faster store. ClickHouse is engineered for this directly.

Does ClickHouse support compute-storage separation?

Yes since 24.x — S3 disk + zero-copy replication. Not as automatic as BigQuery slots, but it eliminates the "must scale storage with compute" problem.

Which has better SQL?

BigQuery's GoogleSQL is closer to ANSI; ClickHouse has dialect quirks but more analytical functions and the entire *State / *Merge aggregate-state model.

What about cost at petabyte scale?

For steady workloads, ClickHouse on dedicated hardware is typically 3–10× cheaper than BigQuery flat-rate. For sporadic queries, BigQuery on-demand often wins.

How do I migrate?

Most schemas port directly. The hard parts: GoogleSQL specifics (STRUCT, ARRAY_AGG), streaming ingestion rewrite, and BigQuery-specific UDFs.