Installing Apache Kafka: A Step-by-Step Guide

Danang Priabada
6 min readSep 8, 2024

--

Apache Kafka is a widely-used distributed streaming platform that can handle real-time data feeds. In this guide, we will walk through the installation process for Apache Kafka on a CentOS system, including setting up Java, creating necessary users, downloading Kafka, and configuring system services.

Official QuickStart : https://kafka.apache.org/quickstart

Install Java

Apache Kafka requires Java to run. We will use OpenJDK 11 for this installation.

sudo yum install java-11-openjdk-devel

Verify the Java installation:

java -version
[root@localhost ~]# java -version
openjdk version "11.0.20.1" 2023-08-24 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.20.1.1-2) (build 11.0.20.1+1-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.20.1.1-2) (build 11.0.20.1+1-LTS, mixed mode, sharing)

Set the default Java version (if multiple versions are installed):

sudo update-alternatives --config java
[root@localhost ~]# update-alternatives --config java
There is 1 program that provides 'java'.

Selection Command
-----------------------------------------------
*+ 1 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.20.1.1-2.el9.x86_64/bin/java)

Enter to keep the current selection[+], or type selection number:

Create Necessary Users

You need to create the Kafka user, this command will create a system user named kafka, and sets the shell to /bin/false to prevent login.

sudo useradd -r -m -s /bin/false kafka

Verify the users:

cat /etc/passwd | awk -F ":" '{print $1}' | grep "kafka"

This user account specifically created only for running services, which does not need interactive login access.

Download and Extract Kafka

Download the Kafka package, here is the other version:

wget https://dlcdn.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz -P /home/centosadmin/Downloads

Or you can used this

curl "https://dlcdn.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz" -o /home/centosadmin/Downloads/kafka_2.13-3.8.0.tgz

Extract Kafka and set ownership:

cd /home/centosadmin/Downloads
mkdir /opt/kafka
tar -xvf kafka_2.13-3.8.0.tgz -C /opt/kafka
sudo chown -R kafka:kafka /opt/kafka

Configure Kafka and ZooKeeper

Create necessary directories and set permissions:

sudo mkdir -p /var/lib/kafka-logs
sudo chown -R kafka:kafka /var/lib/kafka-logs

sudo mkdir -p /var/lib/zookeeper
sudo chown -R kafka:kafka /var/lib/zookeeper

Configure ZooKeeper:

sudo vi /opt/kafka/kafka_2.13-3.8.0/config/zookeeper.properties
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
admin.enableServer=false

Configure Kafka:

sudo vi /opt/kafka/kafka_2.13-3.8.0/config/server.properties
broker.id=0
log.dirs=/var/lib/kafka-logs
zookeeper.connect=localhost:2181

This indicates Kafka is connecting to ZooKeeper at localhost on port 2181. Make sure your ZooKeeper service is running and accessible on this address. If you're using multiple ZooKeeper nodes, you'd use a comma-separated list:

zookeeper.connect=zk1:2181,zk2:2181,zk3:2181

Create Systemd Service Files

Create a systemd service file for ZooKeeper:

ZooKeeper acts as a central authority for Kafka, helping it keep track of cluster metadata, manage leader election, and ensure all brokers are synchronized and operating correctly. This coordination and management are crucial for Kafka’s reliability and performance.

sudo vi /etc/systemd/system/zookeeper.service
[Unit]
Description=Apache ZooKeeper Server
Documentation=https://zookeeper.apache.org/documentation.html
Requires=network.target
After=network.target

[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/zookeeper.properties
ExecStop=/opt/kafka/kafka_2.13-3.8.0/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Reload systemd and start ZooKeeper:

sudo systemctl daemon-reload
sudo systemctl start zookeeper
sudo systemctl enable zookeeper
[root@localhost ~]# systemctl status zookeeper
● zookeeper.service - Apache ZooKeeper Server
Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; preset: disabled)
Active: active (running) since Sun 2024-09-08 17:23:47 WIB; 1h 52min ago
Docs: https://zookeeper.apache.org/documentation.html
Main PID: 6355 (java)
Tasks: 39 (limit: 48564)
Memory: 87.0M
CPU: 26.098s
CGroup: /system.slice/zookeeper.service
└─6355 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/opt/kafka/kaf>

Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,601] INFO Reading snapshot /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,609] INFO The digest value is empty in snapshot (org.apache.zookeeper.server.DataTree)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,617] INFO Snapshot loaded in 26 ms, highest zxid is 0x0, digest is 1371985504 (org.apache.zookeeper.server.ZKDatabase)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,622] INFO Snapshotting: 0x0 to /var/lib/zookeeper/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,627] INFO Snapshot taken in 5 ms (org.apache.zookeeper.server.ZooKeeperServer)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,666] INFO zookeeper.request_throttler.shutdownTimeout = 10000 ms (org.apache.zookeeper.server.RequestThrottler)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,666] INFO PrepRequestProcessor (sid:0) started, reconfigEnabled=false (org.apache.zookeeper.server.PrepRequestProcessor)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,722] INFO Using checkIntervalMs=60000 maxPerMinute=10000 maxNeverUsedIntervalMs=0 (org.apache.zookeeper.server.ContainerManager)
Sep 08 17:23:50 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 17:23:50,724] INFO ZooKeeper audit is disabled. (org.apache.zookeeper.audit.ZKAuditProvider)
Sep 08 18:44:35 localhost.localdomain zookeeper-server-start.sh[6355]: [2024-09-08 18:44:35,554] INFO Creating new log file: log.1 (org.apache.zookeeper.server.persistence.FileTxnLog)
lines 1-20/20 (END)

Create a systemd service file for Kafka:

Kafka manages data streams and handles high-throughput, fault-tolerant messaging, while ZooKeeper supports Kafka by managing cluster coordination and metadata.

sudo vi /etc/systemd/system/kafka.service
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-start.sh /opt/kafka/kafka_2.13-3.8.0/config/server.properties
ExecStop=/opt/kafka/kafka_2.13-3.8.0/bin/kafka-server-stop.sh
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Reload systemd and start Kafka:

sudo systemctl daemon-reload
sudo systemctl start kafka
sudo systemctl enable kafka
[root@localhost ~]# systemctl status kafka
● kafka.service - Apache Kafka Server
Loaded: loaded (/etc/systemd/system/kafka.service; enabled; preset: disabled)
Active: active (running) since Sun 2024-09-08 18:44:29 WIB; 32min ago
Docs: http://kafka.apache.org/documentation.html
Main PID: 7043 (java)
Tasks: 77 (limit: 48564)
Memory: 402.2M
CPU: 1min 15.198s
CGroup: /system.slice/kafka.service
└─7043 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true "-Xlog:gc*:file=/opt/kafka/kafka_2>

Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,425] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-36 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,425] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-6 in 36 milliseconds for epoch 0, of>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-43 in 37 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-13 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,426] INFO [GroupMetadataManager brokerId=0] Finished loading offsets and group metadata from __consumer_offsets-28 in 36 milliseconds for epoch 0, o>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,745] INFO [GroupCoordinator 0]: Dynamic member with unknown member id joins group console-consumer-60356 in Empty state. Created a new member id con>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,787] INFO [GroupCoordinator 0]: Preparing to rebalance group console-consumer-60356 in state PreparingRebalance with old generation 0 (__consumer_of>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,819] INFO [GroupCoordinator 0]: Stabilized group console-consumer-60356 generation 1 (__consumer_offsets-11) with 1 members (kafka.coordinator.group>
Sep 08 18:54:37 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 18:54:37,871] INFO [GroupCoordinator 0]: Assignment received from leader console-consumer-5c445c12-e6d9-4f4f-afd3-3cbf6deac5df for group console-consumer-603>
Sep 08 19:04:20 localhost.localdomain kafka-server-start.sh[7043]: [2024-09-08 19:04:20,788] INFO [NodeToControllerChannelManager id=0 name=forwarding] Node 0 disconnected. (org.apache.kafka.clients.NetworkClient)
lines 1-20/20 (END)

Update PATH

Add Kafka binaries to your PATH for convenience:

vi ~/.bash_profile
# Kafka
export PATH=$PATH:/opt/kafka/kafka_2.13-3.8.0/bin
source ~/.bash_profile

Verify Kafka installation:

kafka-topics.sh --version
[root@localhost ~]# kafka-topics.sh --version
3.8.0

Basic Kafka Commands

Lets verify kafka is running properly on our server

Create a Kafka topic:

kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

List and Describe Kafka topics:

kafka-topics.sh --list --bootstrap-server localhost:9092
kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test-topic

Produce messages to a topic:

We will put (write) some message to the topic from producer console

kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic

Consume messages from a topic:

We will get (read) all messages from the topic

kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --from-beginning
Left : Producer — Right : Consumer

This is recommended series by me on Medium to learn more deep about Kafka : https://medium.com/apache-kafka-from-zero-to-hero

--

--

Danang Priabada
Danang Priabada

Written by Danang Priabada

Red Hat and IBM Product Specialist | JPN : プリアバダ ダナン | CHN : 逹男 | linktr.ee/danangpriabada

Responses (1)