In this tutorial, we will show you how to install Apache Kafka on CentOS 7.
Apache Kafka is an open source messaging system and distributed streaming platform. It’s designed to be scalable, responsive, and provide an excellent experience when dealing with real-time data feeds. It’s great at providing real time analytics and processing of data – and thanks to its rich API support, developers can easily implement Apache Kafka and mold it to their exact needs.
Let’s begin with the installation.
Prerequisites:
Apache Kafka has the following requirements:
- Java 8 or higher installed on the server
- ZooKeeper installed and running on the server
- A server/VPS with a minimum of 4GB RAM.
Step 1. Connect to the Server
Log in to the server via SSH as user root using the following command:
ssh root@IP_ADDRESS -p PORT_NUMBER
replace “IP_ADDRESS” and “PORT_NUMBER” with your actual server IP address and SSH port number.
Step 2: Update OS Packages
Once logged in, make sure that your server OS packages are up-to-date by running the following commands:
yum clean all yum update
Step 3: Install JAVA
Apache Kafka requires Java, so in order to run it on your server, we need to install Java first. We can check if Java is already installed on the server using this command:
which java
If there is no output, it means that Java is not installed on the server yet. We can install Java from a RPM package:
yum install java-1.8.0-openjdk.x86_64
We can check the Java version installed on the server by running the following command:
java -version
The output should be similar to this:
openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
Add the “JAVA_HOME” and “JRE_HOME” environment variables at the end of /etc/bashrc
file:
sudo vi /etc/bashrc
Append the following lines to the original content of the file:
export JRE_HOME=/usr/lib/jvm/jre export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk PATH=$PATH:$JRE_HOME:$JAVA_HOME
Open the ~/.bashrc
file and make sure that the following lines exist:
if [ -f /etc/bashrc ] ; then . /etc/bashrc fi
Run the following command to activate the path settings immediately:
source /etc/bashrc
Step 4: Install Apache Kafka
Create a new system user dedicated for the Kafka service using the following command:
useradd kafka -m
Set a password for the newly created user:
passwd kafka
Use a strong password and enter it twice. Then, run the following command on the server:
sudo usermod -aG wheel kafka
Log in as the newly created user with:
su kafka
Download the latest version of Apache Kafka available at https://kafka.apache.org/downloads and extract it in the home directory of the kafka user account:
cd ~ wget http://apache.osuosl.org/kafka/2.1.0/kafka_2.12-2.1.0.tgz tar -xvzf kafka_2.12-2.1.0.tgz mv kafka_2.12-2.1.0/* . rmdir /home/kafka/kafka_2.12-2.1.0
Apache Kafka uses ZooKeeper to store persistent cluster metadata, so we need to install ZooKeeper. The ZooKeeper files are included with Apache Kafka. ZooKeeper is running on port 2181 and it doesn’t require much maintenance. The ZooKeeper service is responsible for configuration management, leader detection, synchronization, etc.
Create a ZooKeeper systemd unit file so that we can run ZooKeeper as a service:
sudo vi /lib/systemd/system/zookeeper.service
[Unit] Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple User=kafka ExecStart=/home/kafka/bin/zookeeper-server-start.sh /home/kafka/config/zookeeper.properties ExecStop=/home/kafka/bin/zookeeper-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Create a systemd unit file for Apache Kafka:
sudo vi /etc/systemd/system/kafka.service
Add the following lines:
[Unit] Requires=network.target remote-fs.target zookeeper.service After=network.target remote-fs.target zookeeper.service [Service] Type=simple User=kafka ExecStart=/home/kafka/bin/kafka-server-start.sh /home/kafka/config/server.properties ExecStop=/home/kafka/bin/kafka-server-stop.sh Restart=on-abnormal [Install] WantedBy=multi-user.target
Edit the server.properties
file and add/modify the following settings:
vi /home/kafka/config/server.properties
listeners=PLAINTEXT://:9092 log.dirs=/var/log/kafka-logs
After we make changes to a unit file, we should run the ‘systemctl daemon-reload
‘ command for the changes to take effect:
systemctl daemon-reload
Create a new directory ‘kafka-logs’ in the ‘/var/log/
‘ directory on your server:
sudo mkdir -p /var/log/kafka-logs
chown kafka:kafka -R /var/log/kafka-logs
This can be useful for troubleshooting. Once that’s done, start the ZooKeeper and Apache Kafka services:
sudo systemctl start zookeeper.service sudo systemctl start kafka.service
Enable the ZooKeeper and Apache Kafka services to automatically start on server boot:
systemctl enable zookeeper.service
systemctl enable kafka.service
In order to check if ZooKeeper and Kafka services are up and running, run the following commands on the VPS:
systemctl status zookeeper.service
We should receive an output similar to this:
zookeeper.service Loaded: loaded (/usr/lib/systemd/system/zookeeper.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2019-01-25 12:42:42 CST; 16s ago Main PID: 11682 (java) CGroup: /system.slice/zookeeper.service └─11682 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.h...
systemctl status kafka.service
The output of this command should be similar to this one:
kafka.service Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2019-01-25 12:42:50 CST; 42s ago Main PID: 11991 (java) CGroup: /system.slice/kafka.service └─11991 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -Djava.awt.headl...
We can also use the netstat
command to check if Kafka and ZooKeeper services are listening on ports 9092 and 2181 respectively:
sudo netstat -tunlp | grep -e \:9092 -e \:2181 tcp6 0 0 :::9092 :::* LISTEN 11991/java tcp6 0 0 :::2181 :::* LISTEN 11682/java
That is it. We successfully installed Apache Kafka.
Of course, you don’t have to install and configure Apache Kafka on CentOS 7, if you use one of our Fully Managed CentOS Support solutions, in which case you can simply ask our expert Linux admins to setup and configure Apache Kafka on CentOS 7 for you. They are available 24×7 and will take care of your request immediately.
PS. If you liked this post on how to install Apache Kafka on a CentOS 7 VPS, please share it with your friends on the social networks using the buttons on the left or simply leave a reply below. Thanks.
yes you are right…When it comes to data migration tools, you will find Apache NiFi and Apache Kafka at the top of the ladder. Since, two tools are at the top, comparisons are bound to happen. Although Apache NiFi Vs Kafka overlap each other in terms of usability, NiFi might carry an edge over Kafka. NiFi and Kafka have different sets of functions, use cases, architecture, and benefits. In order to answer when should one use Apache NiFi as opposed to Kafka, we will unravel the functions and limitations of both!
wget http://apache.osuosl.org/kafka/2.1.0/kafka_2.12-2.1.0.tgz
Kafka not available in this path
The 2.1.0 directory has been removed and you can try this link http://apache.osuosl.org/kafka/2.8.2/, this directory will list the available version for Kafka version 2.
wget http://apache.osuosl.org/kafka/3.3.1/kafka_2.12-3.3.1.tgz
Use this one