WarpStream
Kafka is dead, long live Kafka
WarpStream is an Apache Kafka protocol compatible data streaming platform built directly on top of S3. It's delivered as a single, stateless Go binary, so there are no local disks to manage, no brokers to rebalance, and no ZooKeeper to operate. WarpStream is 5-10x cheaper than Kafka in the cloud because data streams directly to and from S3 instead of using inter-zone networking, which can be over 80% of the infrastructure cost of a Kafka deployment at scale.
WarpStream Tableflow
It reads from Kafka, builds Iceberg tables, and keeps them compacted
Tableflow automates all of the annoying parts about generating and maintaining Iceberg tables:
- It auto-scales.
- It integrates with schema registries or lets you declare the schemas inline.
- It has a DLQ.
- It handles upserts.
- It enforces retention policies.
- It can perform stateless transformations as records are ingested.
- It keeps the table compacted, and it does so continuously and incrementally without having to run a giant major compaction at regular intervals.
- It cleans up old snapshots automatically.
- It detects and cleans up orphaned files that were created as part of failed inserts or compactions.
- It can ingest data at massive rates (GiBs/s) while also maintaining strict (and configurable) freshness guarantees.
- It speaks multiple table formats (yes, Delta lake too).
- It works exactly the same in every cloud.
Links
Confluent acquired WarpStream for $220 million in a deal completed on September 9, 2024. This acquisition was reported in cash and stock, bringing WarpStream's technology and talent to the data streaming platform Confluent
- WarpStream - An Apache Kafka Compatible Data Streaming Platform
- Intro to WarpStream in 5 Minutes - YouTube
- How WarpStream reinvented Kafka (and soared to a $220m exit in only 13 months) - YouTube
- What The Heck is WarpStream? | HackerNoon
- Confluent acquires WarpStream | Confluent
- WarpStream is Dead, Long Live WarpStream