Pacemaker with DRBD MariaDB Server

Guides 11792 Published 2021-11-02 06:15 by Philipp Esselbach

News

This guide walks you through building a rock‑solid high‑availability cluster for MariaDB by combining block‑level replication with DRBD and the orchestrator Pacemaker. It begins with prerequisites—two identical CentOS 8/9 or Ubuntu 22.04 nodes, root access, and a heartbeat network—then details installing and enabling drbd, pacemaker, corosync, and pcs on both machines. The core steps involve creating a synchronous DRBD resource file, formatting the shared device, mounting it for MySQL, installing MariaDB on each node, grouping the database service with its storage in Pacemaker, adding fencing to avoid split‑brain scenarios, and testing failover by stopping the database or shutting down a node. Finally, the article highlights common pitfalls such as disk size mismatches, blocked Corosync ports, unsynchronized clocks, and driver issues, reminding readers that proper fencing and network reliability are essential for keeping data safe during outages.

Pacemaker with DRBD MariaDB Server: Build a Rock‑Solid High Availability Cluster

If your business can’t afford a single point of failure in MySQL, you’re going to want a setup that keeps the data alive even when a node goes down. This guide shows how to combine DRBD’s block‑level replication with Pacemaker’s resource manager so MariaDB stays online without any manual juggling.

Prerequisites

Two identical CentOS 8/9 or Ubuntu 22.04 nodes on the same LAN
Root access (or sudo privileges) on both servers
A working network that can route heartbeat traffic between them
Basic knowledge of systemctl, crm and shell scripting

I once set this up after a firmware change accidentally disabled one node’s network interface; the cluster kept running because Pacemaker kicked it out, but the database still stayed available on the surviving side. That’s why you need fencing – we’ll get to that.

Installing DRBD and Pacemaker

# CentOS / RHEL
yum install -y drbd90-utils pacemaker corosync pcs

# Ubuntu
apt-get install -y drbd-utils pacemaker corosync pcs

Why this matters: pcs is the command‑line tool that talks to Corosync (the messaging layer) and Pacemaker (the orchestrator). Without it you’re stuck with raw configuration files and a headache.

Start and enable the services:

systemctl start pcsd corosync pacemaker
systemctl enable pcsd corosync pacemaker

On fresh installs you’ll still need to set up an admin password for pcs before you can use it. It’s a one‑time thing, but skip it and you’ll be staring at “Authentication failed” forever.

Configuring the Storage Replication

Create a DRBD resource file on both nodes:

cat > /etc/drbd.d/mariadb.res <<EOF
resource mariadb {
  protocol C;
  on node1 { device /dev/drbd0; disk /dev/sdb1; }   # change sdb1 to your data disk
  on node2 { device /dev/drbd0; disk /dev/sdb1; }
}
EOF

protocol C gives you synchronous replication – the write is acknowledged only after both sides confirm it. That’s the safety net we need for a database.

Start DRBD and bring one side online:

drbdadm create-md mariadb
drbdadm up mariadb
# On node1 (the primary)
drbdadm secondary mariadb
drbdadm primary --force mariadb

The create-md command writes the metadata header. Forgetting it will leave you with a blank device that looks like an empty partition.

Create a filesystem and mount point:

mkfs.xfs /dev/drbd0
mkdir -p /var/lib/mysql
mount /dev/drbd0 /var/lib/mysql

Add it to /etc/fstab so it stays mounted after reboot.

Setting Up the MariaDB Service in Pacemaker

Install MariaDB on both nodes:

yum install -y mariadb-server   # or apt-get install mariadb-server
systemctl enable mysqld

Now tell Pacemaker to manage the service and the DRBD resource:

pcs cluster setup --name mcluster node1 node2
pcs cluster start --all
pcs property set stonith-enabled=true
pcs resource create drbd-mariadb ocf:linbit:drbd \
  op monitor interval=30s timeout=90s \
  meta migration-threshold=0

pcs resource group add mysqlgrp mysqld drbd-mariadb

Why the group? Pacemaker will keep mysqld and its storage together, so it won’t start the database until the block device is up.

Adding a Fence Device

Pacemaker needs to forcibly power‑off a misbehaving node – that’s fencing. The simplest (and cheapest) fence for a home lab is a shared iSCSI target or a network‑based power switch, but you can also use the ocf:heartbeat:noop if you’re confident your cluster will never get split.

pcs resource create fence_mysqld ocf:heartbeat:noop \
  meta force_offline=true
pcs constraint colocation add mysqld with fence_mysqld INFINITY

If you skip fencing and a node loses network connectivity, the other node thinks it’s still alive. The result? Two databases running on the same storage, corrupting data faster than you can say “oops”.

Testing Failover

1. Stop MariaDB on the primary:

   systemctl stop mysqld

Pacemaker should immediately detect that the database is down and move both mysqld and the DRBD resource to the secondary node.

2. Now simulate a hard crash:

   shutdown -h now

The fencing mechanism kicks in, and the surviving node stays up while the dead one powers off cleanly.

3. Bring the first node back online and run:

   pcs cluster sync

Pacemaker will re‑elect a primary and keep the data consistent.

Common Pitfalls

Wrong disk size – If /dev/sdb1 is smaller than the MariaDB data directory, you’ll get “no space left on device” even though DRBD says it’s healthy. Make sure both disks are equal or use a LVM thin pool.
Corosync port blocked – Pacemaker uses UDP 5405. If your firewall blocks that, nodes won’t talk. Open the port or disable the firewall for a quick test.
Unsynchronized clocks – NTP is a must; otherwise “timestamp skew” errors will appear in the logs.

I’ve seen this happen after a bad driver update: the network card stopped sending ARP replies, and the cluster thought the node was still up. Fencing fixed it, but the lesson? Keep your NIC drivers current or lock them to known‑good versions.

Fedora 34 20211101-Live ISOs released

How to Install Bacula Backup Server on CentOS or Almalinux 8