
A Step-by-Step Guide to GlusterFS Replication on Ubuntu for High Availability
In today’s data-driven world, ensuring your information is both accessible and resilient is paramount. A single server failure can lead to costly downtime and data loss. This is where a distributed file system like GlusterFS shines, offering a powerful and scalable solution to build robust storage infrastructure.
GlusterFS is an open-source, scalable network file system designed to handle petabytes of data. One of its most valuable features is the ability to create replicated volumes. A replicated volume ensures high availability by creating an exact copy (a mirror) of your data across two or more servers, known as “bricks.” If one server goes offline, your data remains instantly accessible from the other server in the cluster, eliminating single points of failure.
This guide will walk you through the entire process of setting up a two-node replicated GlusterFS volume on Ubuntu, providing a highly available storage solution for your applications.
Prerequisites: What You’ll Need
Before diving in, ensure you have the following setup:
- Two or more servers running a fresh installation of Ubuntu (this guide uses Ubuntu 20.04/22.04 LTS).
- Root or sudo access on all servers.
- A dedicated storage device or partition on each server for the GlusterFS brick (e.g.,
/dev/sdb
). Using the root partition is not recommended for production. - Network connectivity between the servers. It’s best practice to have hostnames configured in the
/etc/hosts
file on each server for easy communication.
For this tutorial, we will use two servers with the hostnames server1
and server2
.
Step 1: Install GlusterFS Server Software
The first step is to install the GlusterFS server package on all servers that will be part of the storage cluster.
Open a terminal on each server and run the following commands to update your package lists and install the software:
sudo apt update
sudo apt install glusterfs-server -y
Once the installation is complete, ensure the GlusterFS daemon is running and enabled to start on boot:
sudo systemctl start glusterd
sudo systemctl enable glusterd
You can verify its status with sudo systemctl status glusterd
.
Step 2: Configure Firewall Rules
For the servers to communicate, you must open the necessary ports in the firewall. GlusterFS requires port 24007 for the daemon and additional ports for each brick.
On both servers, run the following ufw
(Uncomplicated Firewall) command to allow traffic from the other server. Replace PEER_IP_ADDRESS
with the actual IP address of the other node.
On server1
:
sudo ufw allow from <server2_ip_address> to any port 24007
On server2
:
sudo ufw allow from <server1_ip_address> to any port 24007
Important Note: For a basic setup, you can allow all GlusterFS traffic with sudo ufw allow 24007/tcp
. However, specifying the peer IP is a more secure practice.
Step 3: Create the Trusted Storage Pool
A Trusted Storage Pool is a cluster of servers that trust each other and can contribute storage to a volume. You only need to initiate this process from one of your servers.
From server1
, “probe” server2
to add it to the pool:
sudo gluster peer probe server2
You should see a success message. Now, you can verify the status of the pool from either server:
sudo gluster peer status
The output will show the number of peers in the pool and confirm their connection state.
Step 4: Prepare the Storage Bricks
A brick is the fundamental unit of storage in GlusterFS, typically a directory on a dedicated partition. You must create this directory on both servers.
First, format and mount your dedicated storage device. For this example, we assume a device at /dev/sdb
on each server.
# On both servers
sudo mkfs.xfs /dev/sdb
sudo mkdir -p /data/gluster-brick
sudo mount /dev/sdb /data/gluster-brick
To make this mount permanent across reboots, add it to your /etc/fstab
file.
Now, create the subdirectory inside the mount point that will serve as the actual brick:
# On both servers
sudo mkdir -p /data/gluster-brick/gv0
Step 5: Create and Start the Replicated Volume
With the pool and bricks ready, you can now create the replicated volume. This command should only be run from one server.
The command syntax defines the volume name, the type (replica
), the number of replicas, and the location of the bricks.
sudo gluster volume create gv_replicated replica 2 server1:/data/gluster-brick/gv0 server2:/data/gluster-brick/gv0 force
Let’s break down this command:
gv_replicated
: The name of our new volume.replica 2
: Specifies that we want a replicated volume with two copies of the data.server1:/data/gluster-brick/gv0 server2:/data/gluster-brick/gv0
: The paths to the bricks on each server.force
: This option is used because we are creating bricks inside the/data
directory, which is a subdirectory of the root filesystem structure.
Once the volume is created successfully, you need to start it:
sudo gluster volume start gv_replicated
You can verify the volume’s status and see detailed information with:
sudo gluster volume info
Step 6: Mount the GlusterFS Volume on a Client
Your replicated storage is now active, but to use it, you need to mount it on a client machine. This could be one of the GlusterFS servers themselves or a separate application server.
First, install the GlusterFS client package:
sudo apt update
sudo apt install glusterfs-client -y
Next, create a directory where you will mount the shared volume:
sudo mkdir -p /mnt/gluster-storage
Finally, mount the volume using the hostname of any server in the storage pool. GlusterFS is smart enough to handle the connection.
sudo mount -t glusterfs server1:/gv_replicated /mnt/gluster-storage
To test the setup, create a file in the new mount point:
sudo touch /mnt/gluster-storage/test_file.txt
ls -l /mnt/gluster-storage
You can verify that the file exists by checking the actual brick directories on both server1
and server2
. You will find test_file.txt
in both /data/gluster-brick/gv0/
locations, confirming that replication is working.
To make the mount persistent across reboots, add the following line to your client’s /etc/fstab
file:
server1:/gv_replicated /mnt/gluster-storage glusterfs defaults,_netdev 0 0
The _netdev
option ensures the system waits for the network to be available before attempting to mount the volume.
Security and Best Practices
- Dedicated Network: For production environments, it is highly recommended to use a dedicated, private network interface for all GlusterFS traffic to improve performance and security.
- Firewall Precision: Always restrict firewall rules to allow access only from other trusted servers in the pool. Avoid opening ports to the entire internet.
- Monitoring: Regularly check the status of your volume and peers using
gluster volume status
andgluster peer status
to catch any issues early. - Replication is Not a Backup: While replication provides high availability against server failure, it does not protect against data corruption or accidental deletion. Always maintain a separate backup strategy for your critical data.
By following these steps, you have successfully deployed a robust, highly available storage solution with GlusterFS on Ubuntu, protecting your data from hardware failure and ensuring continuous access for your applications.
Source: https://kifarunix.com/setup-replicated-glusterfs-volume-on-ubuntu/