Master‑Master Replication with MariaDB: A Quick‑Start Guide
If your production database keeps flipping between “up” and “down,” or you need a failover that doesn’t involve any downtime, master‑master replication can be the fix. In this post I’ll walk through how to set it up on two Linux servers, point out where things usually go wrong, and show you the real‑world payoff of having both nodes writeable.
Why you might need a true master‑master pair
A single‑source (master‑slave) setup is fine for read‑heavy workloads. But when your application writes constantly, any hiccup on the sole master can halt everything until it’s back online. With two masters that both accept writes, if one node fails you just keep using the other – no data loss or manual failover scripts required. I’ve seen small SaaS startups lose hours of traffic because their backup never caught up; switching to a true dual‑master setup saved them days.
Prerequisites and quick sanity checks
- Two servers running the same MariaDB version (10.3 or newer).
- The --gtid_mode=ON flag in my.cnf on both machines – GTIDs make conflict resolution automatic.
- A firewall that allows TCP port 3306 between them.
- Sufficient disk space for binlogs, because with two masters each will keep a copy of the other’s events.
If any of those fail, you’ll get cryptic “cannot read event” errors later. So start there.
Step 1: Prepare MariaDB configs
On both servers edit /etc/mysql/mariadb.conf.d/50-server.cnf:
[mysqld] server-id = 1 # change to 2 on the second server log-bin = /var/log/mysql/mysql-bin.log binlog_format = row gtid_mode = ON enforce_gtid_consistency = TRUE master_info_repository = TABLE relay_log_info_repository = TABLE replicate_wild_ignore_table = mysql.*
Why these lines?
- server-id differentiates the nodes for replication.
- Row‑based binlog gives precise changes, which is safer when two masters write simultaneously.
- GTID mode keeps track of every transaction’s unique ID; that’s what resolves conflicts automatically.
After editing reload MariaDB with systemctl restart mariadb.
Step 2: Create a replication user
Pick a username you’ll use on both machines, e.g., replica_user. On both servers run:
CREATE USER 'replica_user'@'%' IDENTIFIED BY 'StrongPass!23'; GRANT REPLICATION SLAVE ON . TO 'replica_user'@'%';
The % host spec lets the other node connect from any IP. In my own labs, I once forgot to grant REPLICATION SLAVE and ended up with “Access denied for replication” errors that took an hour to debug.
Step 3: Capture each server’s current GTID set
On both servers:
SELECT @@GLOBAL.gtid_executed;
Copy the output. It looks like 0-1-12345,1-2-67890. This string tells MariaDB which transactions have already run on this node.
Step 4: Configure the opposite master
On Server A (the one whose GTID you just captured):
CHANGE MASTER TO MASTER_HOST='server-b-ip', MASTER_USER='replica_user', MASTER_PASSWORD='StrongPass!23', MASTER_AUTO_POSITION=1; START SLAVE;
MASTER_AUTO_POSITION=1 tells MariaDB to use GTIDs instead of a file/position pair, which is essential for master‑master. Repeat the same on Server B, swapping server-b-ip with server-a-ip.
Now each server thinks the other is its slave. The first START SLAVE will start pulling binlogs from the peer.
Step 5: Verify the link
Run:
SHOW SLAVE STATUS\G
Look for Slave_IO_Running: Yes and Slave_SQL_Running: Yes. Also check Seconds_Behind_Master. It should be 0 or a very small number. If you see “Connection to master lost” or “Fatal error in slave relay log”, double‑check the firewall and user privileges.
Common pitfalls and how I avoided them
- Conflicting GTIDs: If two nodes already had overlapping transactions before you enabled replication, MariaDB will reject new connections. In that case export all data from one node, import it into the other, then re‑enable replication with clean GTID sets.
- Different versions: Mixing MariaDB 10.2 and 10.3 often causes “cannot read event” errors because of protocol differences. Keep both servers on the same major release.
- Insufficient binlog retention: With two masters each generating logs, you can fill up /var/lib/mysql quickly. Enable expire_logs_days=7 or configure a dedicated binlog partition.
Real‑world benefit in action
After setting this up on my own small ecommerce site, one of the nodes went offline for a routine update. The traffic just kept flowing to the other master without any manual intervention. We never lost an order and there was zero downtime—just another example that true dual‑master replication isn’t a fancy feature; it’s a safety net.