Pacemaker with DRBD MariaDB Server: Build a Rock‑Solid High Availability Cluster
If your business can’t afford a single point of failure in MySQL, you’re going to want a setup that keeps the data alive even when a node goes down. This guide shows how to combine DRBD’s block‑level replication with Pacemaker’s resource manager so MariaDB stays online without any manual juggling.
Prerequisites
- Two identical CentOS 8/9 or Ubuntu 22.04 nodes on the same LAN
- Root access (or sudo privileges) on both servers
- A working network that can route heartbeat traffic between them
- Basic knowledge of systemctl, crm and shell scripting
I once set this up after a firmware change accidentally disabled one node’s network interface; the cluster kept running because Pacemaker kicked it out, but the database still stayed available on the surviving side. That’s why you need fencing – we’ll get to that.
Installing DRBD and Pacemaker
# CentOS / RHEL yum install -y drbd90-utils pacemaker corosync pcs # Ubuntu apt-get install -y drbd-utils pacemaker corosync pcs
Why this matters: pcs is the command‑line tool that talks to Corosync (the messaging layer) and Pacemaker (the orchestrator). Without it you’re stuck with raw configuration files and a headache.
Start and enable the services:
systemctl start pcsd corosync pacemaker systemctl enable pcsd corosync pacemaker
On fresh installs you’ll still need to set up an admin password for pcs before you can use it. It’s a one‑time thing, but skip it and you’ll be staring at “Authentication failed” forever.
Configuring the Storage Replication
Create a DRBD resource file on both nodes:
cat > /etc/drbd.d/mariadb.res <<EOF
resource mariadb {
protocol C;
on node1 { device /dev/drbd0; disk /dev/sdb1; } # change sdb1 to your data disk
on node2 { device /dev/drbd0; disk /dev/sdb1; }
}
EOF
protocol C gives you synchronous replication – the write is acknowledged only after both sides confirm it. That’s the safety net we need for a database.
Start DRBD and bring one side online:
drbdadm create-md mariadb drbdadm up mariadb # On node1 (the primary) drbdadm secondary mariadb drbdadm primary --force mariadb
The create-md command writes the metadata header. Forgetting it will leave you with a blank device that looks like an empty partition.
Create a filesystem and mount point:
mkfs.xfs /dev/drbd0 mkdir -p /var/lib/mysql mount /dev/drbd0 /var/lib/mysql
Add it to /etc/fstab so it stays mounted after reboot.
Setting Up the MariaDB Service in Pacemaker
Install MariaDB on both nodes:
yum install -y mariadb-server # or apt-get install mariadb-server systemctl enable mysqld
Now tell Pacemaker to manage the service and the DRBD resource:
pcs cluster setup --name mcluster node1 node2 pcs cluster start --all pcs property set stonith-enabled=true pcs resource create drbd-mariadb ocf:linbit:drbd \ op monitor interval=30s timeout=90s \ meta migration-threshold=0 pcs resource group add mysqlgrp mysqld drbd-mariadb
Why the group? Pacemaker will keep mysqld and its storage together, so it won’t start the database until the block device is up.
Adding a Fence Device
Pacemaker needs to forcibly power‑off a misbehaving node – that’s fencing. The simplest (and cheapest) fence for a home lab is a shared iSCSI target or a network‑based power switch, but you can also use the ocf:heartbeat:noop if you’re confident your cluster will never get split.
pcs resource create fence_mysqld ocf:heartbeat:noop \ meta force_offline=true pcs constraint colocation add mysqld with fence_mysqld INFINITY
If you skip fencing and a node loses network connectivity, the other node thinks it’s still alive. The result? Two databases running on the same storage, corrupting data faster than you can say “oops”.
Testing Failover
1. Stop MariaDB on the primary:
systemctl stop mysqld
Pacemaker should immediately detect that the database is down and move both mysqld and the DRBD resource to the secondary node.
2. Now simulate a hard crash:
shutdown -h now
The fencing mechanism kicks in, and the surviving node stays up while the dead one powers off cleanly.
3. Bring the first node back online and run:
pcs cluster sync
Pacemaker will re‑elect a primary and keep the data consistent.
Common Pitfalls
- Wrong disk size – If /dev/sdb1 is smaller than the MariaDB data directory, you’ll get “no space left on device” even though DRBD says it’s healthy. Make sure both disks are equal or use a LVM thin pool.
- Corosync port blocked – Pacemaker uses UDP 5405. If your firewall blocks that, nodes won’t talk. Open the port or disable the firewall for a quick test.
- Unsynchronized clocks – NTP is a must; otherwise “timestamp skew” errors will appear in the logs.
I’ve seen this happen after a bad driver update: the network card stopped sending ARP replies, and the cluster thought the node was still up. Fencing fixed it, but the lesson? Keep your NIC drivers current or lock them to known‑good versions.