Confluent Platform Documentation
Confluent Platform | Confluent Documentation
- Overview
- Get Started
- Install and Upgrade
- Overview
- System Requirements
- Supported Versions and Interoperability
- Install Manually
- Deploy with Ansible Playbooks
- Deploy with Confluent for Kubernetes
- License
- Upgrade
- Installation Packages
- Migrate to Confluent Platform
- Migrate to and from Confluent Server
- Migrate from Confluent Server to Confluent Kafka
- Migrate from ZooKeeper to KRaft
- Installation FAQ
- Build Client Applications
- Build Kafka Streams Applications
- Overview
- Quick Start
- Streams API
- Tutorial: Streaming Application Development Basics on Confluent Platform
- Connect Streams to Confluent Cloud
- Concepts
- Architecture
- Examples
- Developer Guide
- Build Pipeline with Connect and Streams
- Operations
- Upgrade
- Frequently Asked Questions
- Javadocs
- ksqlDB
- Overview
- Quick Start
- Install
- Operate
- Upgrade
- Concepts
- How-to Guides
- Develop Applications
- Operate and Deploy
- Reference
- Run ksqlDB in Confluent Cloud
- Connect Local ksqlDB to Confluent Cloud
- Connect ksqlDB to Control Center
- Secure ksqlDB with RBAC
- Frequently Asked Questions
- Troubleshoot
- Tutorials and Examples
- Confluent Private Cloud
- Confluent REST Proxy for Apache Kafka
- Process Data With Flink
- Connect to External Services
- Overview
- Get Started
- Connectors
- Confluent Hub
- Connect on z/OS
- Install
- License
- Supported
- Preview
- Configure
- Monitor
- Logging
- Connect to Confluent Cloud
- Developer Guide
- Tutorial: Moving Data In and Out of Kafka
- Reference
- Transform
- Custom Transforms
- Security
- Design
- Add Connectors and Software
- Install Community Connectors
- Upgrade
- Troubleshoot
- FileStream Connectors
- FAQ
- Manage Schema Registry and Govern Data Streams
- Manage Security
- Overview
- Deployment Profiles
- Compliance
- Authenticate
- Authorize
- Protect Data
- Configure Security Properties using Prefixes
- Secure Components
- Enable Security for a Cluster
- Add Security to Running Clusters
- Configure Confluent Server Authorizer
- Security Management Tools
- Cluster Registry
- Encrypt using Client-Side Payload Encryption
- Deploy Confluent Platform in a Multi-Datacenter Environment
- Overview
- Multi-Data Center Architectures on Confluent Platform
- Cluster Linking on Confluent Platform
- Multi-Region Clusters on Confluent Platform
- Replicate Topics Across Kafka Clusters in Confluent Platform
- Overview
- Example: Active-active Multi-Datacenter
- Tutorial: Replicate Data Across Clusters
- Tutorial: Run as an Executable or Connector
- Configure
- Verify Configuration
- Tune
- Monitor
- Configure for Cross-Cluster Failover
- Migrate from MirrorMaker to Replicator
- Replicator Schema Translation Example for Confluent Platform
- Configure and Manage
- Overview
- Configuration Reference
- CLI Tools for Use with Confluent Platform
- Change Configurations Without Restart
- Manage Clusters
- Overview
- Cluster Metadata Management
- Manage Self-Balancing Clusters
- Self-Balancing Clusters is a Confluent feature that is designed to simplify the process of managing an Apache Kafka® cluster. With this feature enabled, a cluster automatically rebalances partitions across brokers when new brokers are added or existing brokers are removed. This ensures that data is evenly distributed across the cluster, which can improve performance and reduce the risk of data loss in the event of a broker failure.
- Self-Balancing offers:
- Fully automated load balancing.
- Dynamic enablement, meaning you can turn it off or on while the cluster is running and set to rebalance when brokers are added or removed, or for any uneven load (anytime).
- Auto-monitoring of clusters for imbalances based on a large set of parameters, configurations, and runtime variables
- Continuous metrics aggregation and rebalancing plans, generated instantaneously in most cases, and executed automatically
- Automatic triggering of rebalance operations based on simple configurations you set on Control Center for Confluent Platform or in Kafka
server.propertiesfiles. You can choose to auto-balance Only when brokers are added or Anytime, which rebalances for any uneven load. - At-a-glance visibility into the state of your clusters, and the strategy and progress of auto-balancing through a few key metrics.
- Overview
- Tutorial: Adding and Remove Brokers
- Configure
- Performance and Resource Usage
- Auto Data Balancing
- Tiered Storage
- Metadata Service (MDS) in Confluent Platform
- Configure MDS
- Configure Communication with MDS over TLS
- Configure mTLS Authentication and RBAC for Kafka Brokers
- Configure Kerberos Authentication for Brokers Running MDS
- Configure LDAP Authentication
- Configure LDAP Group-Based Authorization for MDS
- MDS as token issuer
- Metadata Service Configuration Settings
- MDS File-Based Authentication for Confluent Platform
- Docker Operations for Confluent Platform
- Run Kafka in Production
- Production Best Practices
- Manage Topics
- Manage Hybrid Environments with USM
- Monitor with Control Center
- Monitor
- Confluent CLI
- Release Notes
- APIs and Javadocs for Confluent Platform
- Glossary
Multi-Region Clusters on Confluent Platform
Overview
Confluent Server is often run across availability zones or nearby datacenters. If the computer network between brokers across availability zones or nearby datacenters is dissimilar, in term of reliability, latency, bandwidth, or cost, this can result in higher latency, lower throughput and increased cost to produce and consume messages.
To mitigate this, three distinct pieces of functionality were added to Confluent Server:
- Follower-Fetching - Before the introduction of this feature, all consume and produce operations took place on the leader. With Multi-Region Clusters, clients are allowed to consume from followers. This dramatically reduces the amount of cross-datacenter traffic between clients and brokers.
- Observers - Historically there are two types of replicas: leaders and followers. Multi-Region Clusters introduces a third type of replica, observers. By default, observers will not join the in-sync replicas (ISR) but will try to keep up with the leader just like a follower. With follower-fetching, clients can also consume from observers.
- Automatic observer promotion - Automatic Observer Promotion is the process whereby an observer is promoted into the in-sync replicas list (ISR). This can be advantageous in certain degraded scenarios. For instance, if there have been enough broker failures for a given partition to be below its minimum in-sync replicas constraint then that partition would normally become offline. With automatic observer promotion, one or more observers can take the place of followers in the ISR keeping the partition online until the followers can be restored. Once followers have been restored (they are caught up and have rejoined the ISR) then the observers are automatically demoted from the ISR.
- Replica Placement - Replica placement defines how to assign replicas to the partitions in a topic. This feature relies on the
broker.rackproperty configured for each broker. For example, you can create a topic that uses observers with the new--replica-placementflag onkafka-topicsto configure the internal propertyconfluent.placement.constraints.
Tutorial: Multi-Region Clusters on Confluent Platform | Confluent Documentation
Multi-Region Clusters allow customers to run a single Apache Kafka® cluster across multiple datacenters. Often referred to as a stretch cluster, Multi-Region Clusters replicate data between datacenters across regional availability zones. You can choose how to replicate data, synchronously or asynchronously, on a per Kafka topic basis. It provides good durability guarantees and makes disaster recovery (DR) much easier.
Benefits:
- Supports multi-site deployments of synchronous and asynchronous replication between datacenters
- Consumers can leverage data locality for reading Kafka data, which means better performance and lower cost
- Ordering of Kafka messages is preserved across datacenters
- Consumer offsets are preserved
- In event of a disaster in a datacenter, new leaders are automatically elected in the other datacenter for the topics configured for synchronous replication, and applications proceed without interruption, achieving very low RTOs and RPO=0 for those topics.
Kafka Stretch Cluster
A Kafka stretch cluster is a single logical cluster where brokers and consensus nodes (ZooKeeper or KRaft) are physically distributed across multiple data centers or availability zones.
Unlike multi-cluster setups that use asynchronous mirroring, a stretch cluster relies on synchronous replication to ensure data is written to multiple locations before acknowledging success to the producer.
Core Architecture & Mechanics
- Logical Unity: It is managed as one cluster, meaning clients (producers/consumers) are generally unaware of the physical distribution.
- Rack Awareness: Kafka uses the
broker.rackconfiguration to identify each broker's physical location (e.g., DC1, DC2, DC3), ensuring partition replicas are spread across different sites. - Quorum Requirement: To maintain stability and prevent "split-brain" scenarios, a stretch cluster typically requires three data centers (or two plus a lightweight "tie-breaker" site) so a majority of consensus nodes can always form a quorum if one site fails.
Critical Configurations
To achieve the primary goal of zero data loss (RPO=0), specific settings are required:
acks=all: Producers must wait for all in-sync replicas to acknowledge the write.min.insync.replicas: Usually set to 2 in a 3-replica setup, ensuring that if one DC fails, the remaining two can still accept writes.- Latency Limits: Because of synchronous writes, the network round-trip time (RTT) between sites directly impacts throughput. It is generally recommended to keep RTT under 50-100ms.
Pros and Cons
| Feature | Benefit / Trade-off |
|---|---|
| Data Durability | RPO=0: No data loss during a single DC failure. |
| Availability | Near-zero RTO: Automatic failover with no manual intervention. |
| Complexity | High: Extremely difficult to monitor and tune for network flakiness. |
| Performance | Lower Throughput: Write latency is tethered to the slowest network link. |
| Cost | High: Requires expensive, low-latency inter-DC connectivity and redundant capacity. |
Kafka Stretch Clusters. Introduction | by Sonam Vermani | Medium
Multi-Data Center Architectures on Confluent Platform | Confluent Documentation
- Multi-data center
- Multi-Cluster replication
- 2-Cluster Active-Passive
- A 2-cluster active-passive architecture involves two fully-operational clusters, each running a separate Confluent cluster, where one cluster is the “active” cluster serving all produce and consume requests and the second “passive” cluster is a copy of the “active” but without applications running against it. When the “active” data center fails, applications failover to the “passive” data center. The key difference between this and the 2-cluster active-active architecture is that the active-passive architecture has applications only running in one cluster during normal operating conditions.
- 2+-Cluster Active-Active
- A 2+-cluster active-active architecture involves two or more fully-independent clusters, each running a separate Confluent cluster, where each cluster is a copy of the other. When one of the data centers fails, applications failover to the other cluster.
- Stretched Cluster 3-Data Center
- A stretched 3-data center cluster architecture in kraft mode involves three data centers that are connected by a low latency (sub-100ms) and stable (very tight p99s) network, usually a “dark fiber” network that is owned or leased privately by the company. Confluent Server nodes are configured with appropriate process.roles (either broker,controller for combined roles or dedicated broker and controller roles) and are spread evenly across the three data centers to form a single, stretched KRaft cluster. The KRaft controller quorum now manages the cluster metadata and handles leader elections for all Kafka brokers in this single, stretched cluster.
- Stretched Cluster 2.5 Data Center
- A stretched 2.5-data center architecture involves two fully-operational data centers and one light (0.5) data center running a single, stretched cluster. The fully operational data centers run an equal number of Confluent Server nodes configured as brokers and controllers, whereas the light data center runs a subset of kraft controller nodes to maintain the controller quorum (the equivalent of running a single ZooKeeper server in legacy deployments). When any single data center fails, the KRaft controller quorum will remain available. When a fully operational data center fails, applications failover to the other data center.