Installing Apache Kafka: A Step-by-Step Guide
Apache Kafka is a widely-used distributed streaming platform that can handle real-time data feeds. In this guide, we will walk through the installation process for Apache Kafka on a CentOS system, including setting up Java, creating necessary users, downloading Kafka, and configuring system services.
Official QuickStart : https://kafka.apache.org/quickstart
Install Java
Apache Kafka requires Java to run. We will use OpenJDK 11 for this installation.
sudo yum install java-11-openjdk-devel
Verify the Java installation:
java -version
[root@localhost ~]# java -version
openjdk version "11.0.20.1" 2023-08-24 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.20.1.1-2) (build 11.0.20.1+1-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.1.1-2) (build 11.0.20.1+1-LTS, mixed mode, sharing)
Set the default Java version (if multiple versions are installed):
sudo update-alternatives --config java
[root@localhost ~]# update-alternatives --config java
There is 1 program that provides 'java'.
Selection Command
-----------------------------------------------
*+ 1 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.20.1.1-2.el9.x86_64/bin/java)
Enter to keep the current selection[+], or type selection number:
Create Necessary Users
You need to create the Kafka user, this command will create a system user named kafka
, and sets the shell to /bin/false
to prevent login.
sudo useradd -r -m -s /bin/false kafka
Verify the users:
cat /etc/passwd | awk -F ":" '{print $1}' | grep "kafka"
This user account specifically created only for running services, which does not need interactive login access.
Download and Extract Kafka
Download the Kafka package, here is the other version:
wget https://dlcdn.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz -P /home/centosadmin/Downloads
Or you can used this
curl "https://dlcdn.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz" -o /home/centosadmin/Downloads/kafka_2.13-3.8.0.tgz
Extract Kafka and set ownership:
cd /home/centosadmin/Downloads
mkdir /opt/kafka
tar -xvf kafka_2.13-3.8.0.tgz -C /opt/kafka
sudo chown -R kafka:kafka /opt/kafka
Configure Kafka and ZooKeeper
Create necessary directories and set permissions:
sudo mkdir -p /var/lib/kafka-logs
sudo chown -R kafka:kafka /var/lib/kafka-logs
sudo mkdir -p /var/lib/zookeeper
sudo chown -R kafka:kafka /var/lib/zookeeper
Configure ZooKeeper:
sudo vi /opt/kafka/kafka_2.13-3.8.0/config/zookeeper.properties
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
admin.enableServer=false
Configure Kafka:
sudo vi /opt/kafka/kafka_2.13-3.8.0/config/server.properties
broker.id=0
log.dirs=/var/lib/kafka-logs
zookeeper.connect=localhost:2181
This indicates Kafka is connecting to ZooKeeper at localhost
on port 2181
. Make sure your ZooKeeper service is running and accessible on this address. If you're using multiple ZooKeeper nodes, you'd use a comma-separated list:
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
Create Systemd Service Files
Create a systemd service file for ZooKeeper:
ZooKeeper acts as a central authority for Kafka, helping it keep track of cluster metadata, manage leader election, and ensure all brokers are synchronized and operating correctly. This coordination and management are crucial for Kafka’s reliability and performance.
sudo vi /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache ZooKeeper Server
Documentation=https://zookeeper.apache.org/documentation.html
Requires=network.target
After=network.target
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/zookeeper.properties
ExecStop=/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Reload systemd and start ZooKeeper:
sudo systemctl daemon-reload
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
[root@localhost ~]# systemctl status zookeeper
● zookeeper.service - Apache ZooKeeper Server
Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; preset: disabled)
Active: active (running) since Sun 2024-09-08 17:23:47 WIB; 1h 52min ago
Docs: https://zookeeper.apache.org/documentation.html
Main PID: 6355 (java)
Tasks: 39 (limit: 48564)
Memory: 87.0M
CPU: 26.098s
CGroup: /system.slice/zookeeper.service
└─6355 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/opt/kafka/kaf>
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,601] INFO Reading snapshot /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,609] INFO The digest value is empty in snapshot (org.apache.zookeeper.server.DataTree)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,617] INFO Snapshot loaded in 26 ms, highest zxid is 0x0, digest is 1371985504 (org.apache.zookeeper.server.ZKDatabase)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,622] INFO Snapshotting: 0x0 to /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,627] INFO Snapshot taken in 5 ms (org.apache.zookeeper.server.ZooKeeperServer)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,666] INFO zookeeper.request_throttler.shutdownTimeout = 10000 ms (org.apache.zookeeper.server.RequestThrottler)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,666] INFO PrepRequestProcessor (sid:0) started, reconfigEnabled=false (org.apache.zookeeper.server.PrepRequestProcessor)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,722] INFO Using checkIntervalMs=60000 maxPerMinute=10000 maxNeverUsedIntervalMs=0 (org.apache.zookeeper.server.ContainerManager)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,724] INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audit.ZKAuditProvider)
Sep 08 18:44:35 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 18:44:35,554] INFO Creating new log file: log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
lines 1-20/20 (END)
Create a systemd service file for Kafka:
Kafka manages data streams and handles high-throughput, fault-tolerant messaging, while ZooKeeper supports Kafka by managing cluster coordination and metadata.
sudo vi /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/server.properties
ExecStop=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Reload systemd and start Kafka:
sudo systemctl daemon-reload
sudo systemctl start kafka
sudo systemctl enable kafka
[root@localhost ~]# systemctl status kafka
● kafka.service - Apache Kafka Server
Loaded: loaded (/etc/systemd/system/kafka.service; enabled; preset: disabled)
Active: active (running) since Sun 2024-09-08 18:44:29 WIB; 32min ago
Docs: http://kafka.apache.org/documentation.html
Main PID: 7043 (java)
Tasks: 77 (limit: 48564)
Memory: 402.2M
CPU: 1min 15.198s
CGroup: /system.slice/kafka.service
└─7043 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/opt/kafka/kafka_2>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,425] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-36 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,425] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-6 in 36 milliseconds for epoch 0, of>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-43 in 37 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-13 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-28 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,745] INFO [GroupCoordinator 0]: Dynamic member with unknown member id joins group console-consumer-60356 in Empty state. Created a new member id con>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,787] INFO [GroupCoordinator 0]: Preparing to rebalance group console-consumer-60356 in state PreparingRebalance with old generation 0 (__consumer_of>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,819] INFO [GroupCoordinator 0]: Stabilized group console-consumer-60356 generation 1 (__consumer_offsets-11) with 1 members (kafka.coordinator.group>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,871] INFO [GroupCoordinator 0]: Assignment received from leader console-consumer-5c445c12-e6d9-4f4f-afd3-3cbf6deac5df for group console-consumer-603>
Sep 08 19:04:20 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 19:04:20,788] INFO [NodeToControllerChannelManager id=0 name=forwarding] Node 0 disconnected. (org.apache.kafka.clients.NetworkClient)
lines 1-20/20 (END)
Update PATH
Add Kafka binaries to your PATH for convenience:
vi ~/.bash_profile
# Kafka
export PATH=$PATH:/opt/kafka/kafka_2.13-3.8.0/bin
source ~/.bash_profile
Verify Kafka installation:
kafka-topics.sh --version
[root@localhost ~]# kafka-topics.sh --version
3.8.0
Basic Kafka Commands
Lets verify kafka is running properly on our server
Create a Kafka topic:
kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
List and Describe Kafka topics:
kafka-topics.sh --list --bootstrap-server localhost:9092
kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test-topic
Produce messages to a topic:
We will put (write) some message to the topic from producer console
kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic
Consume messages from a topic:
We will get (read) all messages from the topic
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning
This is recommended series by me on Medium to learn more deep about Kafka : https://medium.com/apache-kafka-from-zero-to-hero