How to Install Apache Kafka on CentOS 7

In this tutorial, we will show you how to install Apache Kafka on CentOS 7.

Apache Kafka is an open source messaging system and distributed streaming platform. It’s designed to be scalable, responsive, and provide an excellent experience when dealing with real-time data feeds. It’s great at providing real time analytics and processing of data – and thanks to its rich API support, developers can easily implement Apache Kafka and mold it to their exact needs.

Let’s begin with the installation.

Prerequisites:

Apache Kafka has the following requirements:

  • Java 8 or higher installed on the server
  • ZooKeeper installed and running on the server
  • A server/VPS with a minimum of 4GB RAM.

Step 1. Connect to the Server

Log in to the server via SSH as user root using the following command:

ssh root@IP_ADDRESS -p PORT_NUMBER

replace “IP_ADDRESS” and “PORT_NUMBER” with your actual server IP address and SSH port number.

Step 2: Update OS Packages

Once logged in, make sure that your server OS packages are up-to-date by running the following commands:

yum clean all
yum update

Step 3: Install JAVA

Apache Kafka requires Java, so in order to run it on your server, we need to install Java first. We can check if Java is already installed on the server using this command:

which java

If there is no output, it means that Java is not installed on the server yet. We can install Java from a RPM package:

yum install java-1.8.0-openjdk.x86_64

We can check the Java version installed on the server by running the following command:

java -version

The output should be similar to this:

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

Add the “JAVA_HOME” and “JRE_HOME” environment variables at the end of /etc/bashrc file:

sudo vi /etc/bashrc

Append the following lines to the original content of the file:

export JRE_HOME=/usr/lib/jvm/jre
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
PATH=$PATH:$JRE_HOME:$JAVA_HOME

Open the ~/.bashrc file and make sure that the following lines exist:

if [ -f /etc/bashrc ] ; then
  . /etc/bashrc
fi

Run the following command to activate the path settings immediately:

source /etc/bashrc

Step 4: Install Apache Kafka

Create a new system user dedicated for the Kafka service using the following command:

useradd kafka -m

Set a password for the newly created user:

passwd kafka

Use a strong password and enter it twice. Then, run the following command on the server:

sudo usermod -aG wheel kafka

Log in as the newly created user with:

su kafka

Download the latest version of Apache Kafka available at https://kafka.apache.org/downloads and extract it in the home directory of the kafka user account:

cd ~
wget http://apache.osuosl.org/kafka/2.1.0/kafka_2.12-2.1.0.tgz
tar -xvzf kafka_2.12-2.1.0.tgz
mv kafka_2.12-2.1.0/* .
rmdir /home/kafka/kafka_2.12-2.1.0

Apache Kafka uses ZooKeeper to store persistent cluster metadata, so we need to install ZooKeeper. The ZooKeeper files are included with Apache Kafka. ZooKeeper is running on port 2181 and it doesn’t require much maintenance. The ZooKeeper service is responsible for configuration management, leader detection, synchronization, etc.
Create a ZooKeeper systemd unit file so that we can run ZooKeeper as a service:

sudo vi /lib/systemd/system/zookeeper.service
[Unit]
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/bin/zookeeper-server-start.sh /home/kafka/config/zookeeper.properties
ExecStop=/home/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Create a systemd unit file for Apache Kafka:

sudo vi /etc/systemd/system/kafka.service

Add the following lines:

[Unit]
Requires=network.target remote-fs.target zookeeper.service
After=network.target remote-fs.target zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/home/kafka/bin/kafka-server-start.sh /home/kafka/config/server.properties
ExecStop=/home/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Edit the server.properties file and add/modify the following settings:

vi /home/kafka/config/server.properties
listeners=PLAINTEXT://:9092
log.dirs=/var/log/kafka-logs

After we make changes to a unit file, we should run the ‘systemctl daemon-reload‘ command for the changes to take effect:

systemctl daemon-reload

Create a new directory ‘kafka-logs’ in the ‘/var/log/‘ directory on your server:

sudo mkdir -p /var/log/kafka-logs
chown kafka:kafka -R /var/log/kafka-logs

This can be useful for troubleshooting. Once that’s done, start the ZooKeeper and Apache Kafka services:

sudo systemctl start zookeeper.service
sudo systemctl start kafka.service

Enable the ZooKeeper and Apache Kafka services to automatically start on server boot:

systemctl enable zookeeper.service
systemctl enable kafka.service

In order to check if ZooKeeper and Kafka services are up and running, run the following commands on the VPS:

systemctl status zookeeper.service

We should receive an output similar to this:

zookeeper.service
   Loaded: loaded (/usr/lib/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-01-25 12:42:42 CST; 16s ago
 Main PID: 11682 (java)
   CGroup: /system.slice/zookeeper.service
           └─11682 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.h...
systemctl status kafka.service

The output of this command should be similar to this one:

kafka.service
   Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2019-01-25 12:42:50 CST; 42s ago
 Main PID: 11991 (java)
   CGroup: /system.slice/kafka.service
           └─11991 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headl...

We can also use the netstat command to check if Kafka and ZooKeeper services are listening on ports 9092 and 2181 respectively:

sudo netstat -tunlp | grep -e \:9092 -e \:2181
tcp6       0      0 :::9092                 :::*                    LISTEN      11991/java
tcp6       0      0 :::2181                 :::*                    LISTEN      11682/java

That is it. We successfully installed Apache Kafka.


Of course, you don’t have to install and configure Apache Kafka on CentOS 7, if you use one of our Fully Managed CentOS Support solutions, in which case you can simply ask our expert Linux admins to setup and configure Apache Kafka on CentOS 7 for you. They are available 24×7 and will take care of your request immediately.

PS. If you liked this post on how to install Apache Kafka on a CentOS 7 VPS, please share it with your friends on the social networks using the buttons on the left or simply leave a reply below. Thanks.

4 thoughts on “How to Install Apache Kafka on CentOS 7

  1. yes you are right…When it comes to data migration tools, you will find Apache NiFi and Apache Kafka at the top of the ladder. Since, two tools are at the top, comparisons are bound to happen. Although Apache NiFi Vs Kafka overlap each other in terms of usability, NiFi might carry an edge over Kafka. NiFi and Kafka have different sets of functions, use cases, architecture, and benefits. In order to answer when should one use Apache NiFi as opposed to Kafka, we will unravel the functions and limitations of both!

    1. The 2.1.0 directory has been removed and you can try this link http://apache.osuosl.org/kafka/2.8.2/, this directory will list the available version for Kafka version 2.

Leave a Reply

Your email address will not be published. Required fields are marked *