How To Install Apache Spark on AlmaLinux 9

In this tutorial, we are going to explain in step-by-step detail how to install Apache Spark on AlmaLinux 9 OS.

Apache Spark is an open-source unified analytics engine used for data processing large amounts of data. This software stores data in memory and performs the operations very fast using SQL query languages. Apache Spark is written in Scala language and is compatible with multiple operating systems such as macOS, Windows, and Linux of course.

Installing Apache Spark and running it as a service is a straightforward process and may take up to 10 minutes. Let’s get things done.

Prerequisites

Fresh installation of AlmaLinux 9 as OS
User privileges: root or non-root user with sudo privileges

Step 1. Update the System

Before we install Apache Spark we need to update the system packages to the latest version available.

sudo dnf update -y && sudo dnf upgrade -y

Step 2. Install Java

The application programming interface is written in Java so we need to install Java with the following command:

sudo dnf install java-11-openjdk -y

After the installation you can check the installed Java version with the following command:

java -version

You should get the following output:

[root@vps ~]# java -version
openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-2.el9_1) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-2.el9_1) (build 11.0.18+10-LTS, mixed mode, sharing)

Step 3. Install Apache Spark

We can not install Apache Spark with the command since it is not included in the default repository of AlmaLinux 9. We need to download it manually and extract it in a directory on the server. To do that execute the following commands:

cd /home

curl -O https://www.apache.org/dyn/closer.lua/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz

gunzip -c spark-3.2.3-bin-hadoop3.2.tgz | tar xvf -

rm spark-3.2.3-bin-hadoop3.2.tgz

mv spark-3.2.3-bin-hadoop3.2/ spark/

Step 4. Create System User

We need to create a system user for Apache Spark. To do that execute the following command:

 useradd useradd sparkuser

Once, created you need to set the right permissions of the Apache Spark document root:

chown -R sparkuser:sparkuser /home/spark

Step 5. Create Service files

Now, we need to create two separate services for Apache Spark. Those services are master and slave services required for Apache Spark to work efficiently. We will start with the master service:

Create the master service file first:

touch /etc/systemd/system/spark-master.service

Open the file with your favorite editor and paste the following lines of codes:

[Unit]
Description=Apache Spark Master
After=network.target

[Service]
Type=forking
User=sparkuser
Group=sparkuser
ExecStart=/home/spark/sbin/start-master.sh
ExecStop=/home/spark/sbin/stop-master.sh

[Install]
WantedBy=multi-user.target

Now, create the slave service file.

touch /etc/systemd/system/spark-slave.service

Open the file with your favorite editor and paste the following lines of codes:

[Unit]
Description=Apache Spark Slave
After=network.target

[Service]
Type=forking
User=sparkuser
Group=sparkuser
ExecStart=/home/spark/sbin/start-slave.sh spark://127.0.0.1:7077
ExecStop=/home/spark/sbin/stop-slave.sh

[Install]
WantedBy=multi-user.target

Once, the files are created reload the daemon and start the services:

sudo systemctl daemon-reload

sudo systemctl start spark-master

sudo systemctl start spark-slave

To check the status of the master service execute the following command:

sudo systemctl status spark-master

You should receive the following output:

[root@host spark]# sudo systemctl status spark-master
● spark-master.service - Apache Spark Master
     Loaded: loaded (/etc/systemd/system/spark-master.service; disabled; vendor preset: disabled)
     Active: active (running) since Sat 2023-02-25 05:45:47 CST; 48s ago
    Process: 1274 ExecStart=/home/spark/sbin/start-master.sh (code=exited, status=0/SUCCESS)
   Main PID: 1286 (java)
      Tasks: 34 (limit: 24796)
     Memory: 242.9M
        CPU: 11.958s
     CGroup: /system.slice/spark-master.service
             └─1286 /usr/lib/jvm/java-11-openjdk-11.0.18.0.10-2.el9_1.x86_64/bin/java -cp "/home/spark/conf/:/home/spark/jars/*" -Xmx1g org.apache.spark.deploy.master.

To check the status of the slave service execute the following command:

sudo systemctl status spark-slave