Skip to main content

Migration / Mirroring / Replication

20240425-EB-Migrating_From_Kafka_To_Confluent.pdf

Cross-Cluster Data Mirroring

  • Multicluster architecture
    • Hub-and-Spokes architecture
    • Active-Active architecture
    • Active-standby architecture
    • Stretch clusters
  • MirrorMaker1 and MirrorMaker2
  • Other cross-cluster mirroring solutions

Comparisons

Confluent Replicator vs MirrorMaker 2.0

FeatureMirrorMaker 2.0 (MM2)Confluent Replicator
Core EngineKafka Connect (Source / Checkpoint / Heartbeat connectors)Kafka Connect (proprietary optimized engine)
Offset ManagementMaps offsets via checkpoint topics; requires client-side logic to translate.Automatic Offset Translation; consumers can switch clusters without manual mapping.
Topic SyncSyncs data and some metadata; requires manual tuning for complex configs.Dynamic Topic Sync; automatically mirrors partitions, ACLs, and retention policies.
Loop PreventionUses topic renaming (e.g., source.topic) or internal heartbeats.Uses Provenance Headers; allows for active-active without changing topic names.
CostOpen Source (Apache 2.0)Commercial (Requires Confluent Enterprise License)
Consumer Group MigrationOnly syncs offsets for the group; doesn't manage group state well.Automated Group Failover; syncs both offsets and group metadata.
Schema Registry IntegrationRequires manual setup of Schema Linking or separate sync.Built-in Schema Validation & automatic schema migration between registries.
Active-Active Topic NamingTopic Renaming Required: Topics must be prefixed (e.g., A.topic1, B.topic1).Identical Topic Names: Uses Provenance Headers to keep names identical across sites.
Security (ACLs)Does not natively sync ACLs; requires external automation.Native ACL Synchronization: Syncs access rights along with the data.
Resource EfficiencyHigher overhead; runs as a collection of multiple connectors.High-Efficiency Engine: Single process optimized for low-latency batching.
Header ManipulationBasic support for header copying.Advanced Header Handling: Preserves or modifies metadata for routing logic.

Architectural Differences

MirrorMaker 2.0

MM2 is built as a set of three Kafka Connectors. It relies on a "remote topic" naming convention (e.g., us-east.orders) to prevent circular loops. While powerful, it often requires additional "RemoteClusterUtils" code in your applications to find the correct offset when failing over.

Confluent Replicator

Replicator is designed to make the destination cluster look exactly like the source. It preserves original topic names, making it easier for applications to switch clusters during a DR event. It is particularly superior in its ability to handle consumer group migration automatically.

Data Routing & Loop Prevention (Active-Active)

In a global setup where Cluster A and Cluster B both produce and consume data, preventing "infinite loops" (where A replicates to B, then B replicates that same data back to A) is critical.

MirrorMaker 2.0

It uses a renaming strategy. Data from Cluster A is written to A.orders on Cluster B. Because the name is different, MM2 knows not to pull A.orders back to Cluster A. However, this forces applications to be "cluster-aware" and consume from multiple topics.

Confluent Replicator

It uses Provenance Headers. A small piece of metadata is attached to the message identifying its origin. Replicator checks this header and simply ignores any message that originated from the target cluster. This allows for identical topic names across the globe, making client configuration much simpler.

Exactly-Once Semantics (EOS)

MirrorMaker 2.0

Supports "at-least-once" delivery by default. Achieving exactly-once requires specific Connect configurations and can be complex to tune.

Confluent Replicator

Specifically optimized to work with Kafka’s transactional API, ensuring that even if a replication task fails and restarts, the data is not duplicated on the destination cluster.

Multi-Region Clusters vs Confluent Replicator

Confluent offers two approaches to multi-datacenter replication: Multi-Region Clusters and Confluent Replicator. The main difference between these two approaches is that a Multi-Region Cluster is a single cluster spanning multiple datacenters, whereas Replicator operates on two separate clusters. Multi-Region Clusters are built into Confluent Server, so they can greatly simplify setup and management of replication flows, particularly in failover/failback scenarios. For a Multi-Region Cluster to operate reliably, the datacenters must be physically close to each other, with a stable, low-latency, and high-bandwidth connection.

Ultimately, the best approach to use depends on your individual use cases and resilience requirements. For more information about using Multi-Region Clusters with Confluent Server, see David Arthur’s excellent post Multi-Region Clusters with Confluent Platform 5.4. Confluent provides a proven, feature-rich solution for multi-cluster scenarios with Confluent Replicator, a Kafka Connect connector that provides a high-performance and resilient way to copy topic data between clusters. You can deploy Replicator into existing Connect clusters, launch it as a standalone process, or deploy it on Kubernetes using Confluent Operator.

Confluent Replicator vs Cluster Linking

The most important difference: Cluster Linking is a broker-level, real-time, byte-for-byte replication solution that is simpler and more efficient for supported scenarios, while Confluent Replicator is a Kafka Connect-based tool offering more flexibility, compatibility, and transformation options.

Key Differences

Feature/AspectCluster LinkingConfluent Replicator
MechanismBroker-level, native Kafka protocolKafka Connect source connector
SetupNo Connect cluster needed; built into brokersRequires Kafka Connect cluster
Data FidelityByte-for-byte, globally consistent offsetsTopic-level, may not preserve all metadata
LatencyExtremely low, near real-timeHigher, depends on Connect and network
Topic AccessMirror topics are read-only during replicationFull read/write access on destination topics
Schema ReplicationNo (use Schema Linking separately)Yes, can migrate schemas
Consumer OffsetsPreserved, globally consistentCan replicate, but not always exact
TransformationsNot supportedSupports filtering, renaming, transformations
Source CompatibilityConfluent Platform 6.0+ onlyAny Kafka (self-managed, MSK, etc.)
Use with Private NetworkingLimitedSupported (can run in same VPC)
Managed Service AvailabilityYes (Confluent Cloud)No (must self-manage Connect)
Active-Active SupportYes (with caveats on ordering)Yes (bi-directional setup)
ACL/Config ReplicationYes (Cloud-to-Cloud only, not CP/MSK)No
MonitoringBasic (metrics API)Advanced (via Connect/Control Center)

Limitations and Considerations

  • Cluster Linking does not support all message formats (e.g., v0/v1), and will fail for those topics. Replicator can be used in such cases.
  • Replicator does not copy ACLs or RBAC settings.
  • Schema replication with Cluster Linking requires Schema Linking as a separate step.
  • Replicator may replicate aborted transactions unless configured with isolation.level=read_committed, but even then, transactional markers are not preserved.
  • Cluster Linking is generally easier to manage and monitor, but Replicator offers more advanced monitoring and transformation options.

Migration

Tools

GitHub - confluentinc/kafka-metrics-extractor

kafka-metrics-extractor is a tool designed to pull raw usage from Kafka providers such as MSK, OSK and others (currently supports MSK clusters only). The script for extracting MSK usage, it uses MSK permissions to list and describe the clusters only and then collects the usage data from CloudWatch and CostExplorer in order to avoid any cluster disruption.

GitHub - confluentinc/kcp (Kafka Copy Paste)

  • Simplify and streamline your Kafka migration journey to Confluent Cloud!
  • kcp helps you migrate your Kafka setups to Confluent Cloud by providing tools to:
    • Scan and identify resources in existing Kafka deployments.
    • Create reports for migration planning and cost analysis.
    • Generate migration assets and infrastructure configurations.
kcp discover --region ap-south-1

kcp scan clusters --state-file state.json --credentials-file credentials.yaml

kcp create-asset migration-infrastructure

Scaling

Is there anyway to activate auto scaling or some form of auto scaling with Strimzi? · strimzi · Discussion #6635 · GitHub

Auto-scaling Kafka is complicated. It usually cannot be done just based on some CPU utilization etc.

  • If you want to scale consumers, you need to understand their consumer group membership and which topics are they consuming. Because the maximum number of replicas is for example limited with number of partitions from which they are consuming. You need to use tools such as for example KEDA to autoscale them which have some additional logic to take these things into account.
  • If you want to auto-scale components such as Connect, Connectors, Bridge etc., Strimzi gives you the scale subresources to plug it into Kubernetes HPA and tools like KEDA. These are basically consumers and producers in a special packaging. So the same rules as described above apply for them.
  • For Kafka brokers, auto-scaling is complicated because of their architecture. Adding or removing brokers is simple. But directing some load to them is complicated because they are in a way form of data storage. And moving the whole partitions between brokers is expensive. The partitions often contain huge amounts of data which need to be shifted from one broker to another - that will take time, it will have a performance penalty on the other traffic and possibly cost even real money for the data transfers. Plus it still might not work because if your bottleneck is for example a topic with 5 partitions, it might not matter whether you have 5 or 10 brokers. So from my experience, only rarely autoscaling of Kafka brokers makes sense.

Others