Spark vs Flink
Batch-First vs Streaming-First, and the Tradeoffs That Follow
Spark and Flink are both top-tier distributed processing engines, both written in Scala/Java, both running on JVM, both supporting batch and streaming. They look interchangeable from a brochure. They are not — the underlying execution model is fundamentally different, and that difference cascades into latency, state management, exactly-once semantics, and ecosystem fit.
Spark is batch-first: an RDD/DataFrame engine where streaming is incremental batches. Flink is streaming-first: a continuous dataflow engine where batch is bounded streams. For analytical SQL on a data lake, Spark wins on tooling and optimizer. For low-latency stateful streaming, Flink wins on pretty much every dimension.
Architecture Comparison
Key Numbers
Streaming Models
| Aspect | Spark Structured Streaming | Flink DataStream |
|---|---|---|
| Execution | Micro-batch (continuous trigger experimental) | True per-record streaming |
| Latency | ~100ms-seconds | ~ms (single-digit typical) |
| Backpressure | Implicit via batch timing | Explicit, credit-based, well-instrumented |
| Event-time | Watermark per source partition | Watermark per parallel subtask, idleness-aware |
| Out-of-order | allowedLateness; older dropped | Side outputs for late events; configurable retention |
| Window types | Tumbling, sliding, session (fixed gap) | + Session with custom gap, custom assigners |
| State APIs | flatMapGroupsWithState (typed but limited) | Rich KeyedState API: ValueState, ListState, MapState |
Exactly-Once Approaches
# Spark: micro-batch atomicity
# Each micro-batch:
# 1. Read offsets from source (checkpoint)
# 2. Process the batch
# 3. Write outputs (transactional sinks for exactly-once)
# 4. Commit offsets to checkpoint
# Crash mid-batch: replay from last checkpoint, idempotent reprocess
# Limitation: latency floor = batch interval
# Sink support: Kafka, Delta, Iceberg, file sink, foreachBatch
# Flink: async distributed snapshots
# Periodically:
# 1. JobManager injects checkpoint barrier into every input
# 2. Each operator snapshots its state when barrier passes
# 3. Sinks pre-commit (2PC); commit on checkpoint complete
# 4. JobManager records the global checkpoint as durable
# Crash: restore all operators from latest checkpoint, sinks roll back uncommitted
# Latency floor: per-record (snapshots are async)
# Sink support: Kafka, JDBC w/ XA, files, any 2PC sink Batch Workloads
| Aspect | Spark | Flink |
|---|---|---|
| SQL optimizer | Catalyst (~150 rules) + AQE runtime | Apache Calcite-based, fewer specialized rules |
| Code generation | Tungsten whole-stage codegen | Some codegen, less aggressive than Tungsten |
| Lakehouse ecosystem | Delta Lake, Iceberg, Hudi all native | Iceberg connector solid; Delta/Hudi via 3rd party |
| ML | MLlib (mature, retired-but-present), Spark ML pipelines | FlinkML in maintenance mode |
| Adaptive Query Execution | Default-on, dynamic skew + coalesce + join switch | Less mature runtime adaptation |
| Cloud support | EMR, Dataproc, Databricks, Synapse, Glue | Kinesis Analytics, Ververica Cloud, less native cloud |
State Management Details
# Spark Structured Streaming state stores
spark.sql.streaming.stateStore.providerClass =
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider
# State at each stateful operator partition, checkpointed each batch.
# Recovery: load latest snapshot + replay deltas to current batch.
# Realistic state size: ~100s GB per executor.
# Flink state backends
state.backend = rocksdb
state.backend.incremental = true
state.backend.changelog.enabled = true # Flink 1.16+
# Async snapshots, only RocksDB SST diffs uploaded.
# Recovery from checkpoint is constant-time (lazy load).
# Realistic state size: 10s of TB per operator.
# Why the gap?
# - Flink built state-first; checkpoints are core, not retrofitted
# - Flink supports incremental checkpoints since 1.3 (2017)
# - Spark only got RocksDB state in 3.2 (2021); full incremental in 3.5 When Each Wins
Spark wins for
- Analytical batch SQL on data lakes (Delta/Iceberg)
- ETL pipelines with Catalyst-friendly SQL
- ML workflows (MLlib + integrations)
- Streaming where seconds latency is fine and code reuse with batch matters
- Teams already on Databricks / Spark cloud services
Flink wins for
- Sub-100ms latency stream processing
- Complex event-time semantics, custom windows
- Multi-TB state, fast incremental checkpoints
- Many non-Kafka sources/sinks (Kinesis, Pulsar, JDBC XA)
- CEP (complex event processing) and stateful pattern matching
Ecosystem Notes
# Spark
# Cloud-managed: Databricks, EMR, Dataproc, Synapse, Glue, Fabric
# Lakehouse: Delta Lake (Linux Foundation), Iceberg (Apache), Hudi (Apache)
# Notebook integration: Jupyter, Databricks, Zeppelin
# Languages: Scala, Java, Python (PySpark), R (SparkR), SQL
# Key vendors: Databricks, Cloudera, Confluent (for Streams)
# Flink
# Cloud-managed: Amazon Kinesis Data Analytics, Ververica Cloud, Aliyun
# Companion: Flink CDC (Change Data Capture), Flink ML
# Languages: Java, Scala, Python (PyFlink), SQL
# Key vendors: Ververica (data Artisans), Alibaba, Cloudera
# Beam (portability layer)
# Lets you write once, run on Spark, Flink, Dataflow, Samza
# Tradeoff: feature ceiling = lowest common denominator + translation overhead
# Used by Google Dataflow customers, less common elsewhere Frequently Asked Questions
Why is Spark called 'batch-first' if it has Structured Streaming?
Spark started as a batch RDD engine; streaming was retrofitted on top as micro-batches. The execution model is fundamentally 'run a small batch every interval'. Flink, in contrast, was designed as a streaming dataflow runtime from day one — every operator processes records continuously, and batch is implemented as a special case of bounded streams. This shows up in latency (Flink is ~10x lower) and in features (Flink's event-time, watermarks, and savepoints are deeper).
Which has better state management?
Flink, by a wide margin. Flink's state backends (RocksDB with incremental checkpoints, ChangelogStateBackend) routinely handle multi-TB state per operator with sub-minute checkpoints. Spark Structured Streaming's state stores are more limited — RocksDB support arrived in 3.2 and is still maturing. For state-heavy applications (stream joins with weeks of retention, sessionization at scale), Flink is the standard choice.
Which is better for batch?
Spark, decisively. Catalyst + Tungsten + AQE form the most polished batch-SQL stack in the open-source ecosystem, and the surrounding tooling (Delta Lake, Iceberg, Hudi, Spark UI, broad cloud support) makes Spark the default for analytical batch processing. Flink supports batch via the same DataStream/Table APIs, but the ecosystem and tooling are thinner, and the query optimizer is less mature than Catalyst.
Can the same code run on Spark and Flink?
Largely no. Spark uses Datasets/DataFrames; Flink uses DataStream/DataSet (now unified into DataStream + Table API). The Apache Beam project provides a portable API, and your job can run on either Spark or Flink as a runner — but Beam adds a translation layer with its own performance characteristics, and most production users pick a runtime and stick with it.
Which has better Kubernetes support?
Both are mature on K8s. Flink has a native operator (Flink Kubernetes Operator) and was designed with long-running jobs in mind, so its K8s integration handles streaming-specific concerns like checkpoint storage and HA out of the box. Spark on K8s is also production-ready (3.1+) but is more oriented toward batch-style job submission. For long-running streaming jobs on K8s, Flink's lifecycle is slightly cleaner; for ad-hoc batch jobs, Spark's pattern is simpler.