Skip to main content

WarpStream

Kafka is dead, long live Kafka

WarpStream is an Apache Kafka protocol compatible data streaming platform built directly on top of S3. It's delivered as a single, stateless Go binary, so there are no local disks to manage, no brokers to rebalance, and no ZooKeeper to operate. WarpStream is 5-10x cheaper than Kafka in the cloud because data streams directly to and from S3 instead of using inter-zone networking, which can be over 80% of the infrastructure cost of a Kafka deployment at scale.

WarpStream Tableflow

It reads from Kafka, builds Iceberg tables, and keeps them compacted

Tableflow automates all of the annoying parts about generating and maintaining Iceberg tables:

  1. It auto-scales.
  2. It integrates with schema registries or lets you declare the schemas inline.
  3. It has a DLQ.
  4. It handles upserts.
  5. It enforces retention policies.
  6. It can perform stateless transformations as records are ingested.
  7. It keeps the table compacted, and it does so continuously and incrementally without having to run a giant major compaction at regular intervals.
  8. It cleans up old snapshots automatically.
  9. It detects and cleans up orphaned files that were created as part of failed inserts or compactions.
  10. It can ingest data at massive rates (GiBs/s) while also maintaining strict (and configurable) freshness guarantees.
  11. It speaks multiple table formats (yes, Delta lake too).
  12. It works exactly the same in every cloud.

The Case for an Iceberg-Native Database: Why Spark Jobs and Zero-Copy Kafka Won’t Cut It - WarpStream

Confluent acquired WarpStream for $220 million in a deal completed on September 9, 2024. This acquisition was reported in cash and stock, bringing WarpStream's technology and talent to the data streaming platform Confluent