Using Nagios to monitor Barman

Introduction

A word about Barman. As stated on their web site https://www.pgbarman.org/ :

Allows your company to implement disaster recovery solutions for PostgreSQL databases with high requirements of business continuity.

Taking an online hot backup of PostgreSQL is now as easy as ordering a good espresso coffee.

In short Barman allows easy management of PostgreSQL backups, incremental backups, point in time recovery and many other nice things. It might be prudent to monitor it’s health.

Happily, Barman check function has built-in --nagios flag that will output result in Nagios friendly way.

Barman check output usually looks like this:

$ barman check all
Server localhost:
PostgreSQL: OK
superuser or standard user with backup privileges: OK
PostgreSQL streaming: OK
wal_level: OK
replication slot: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 20 backups, expected at least 0)
ssh: OK (PostgreSQL server)
systemid coherence: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK

Now with --nagios flag:

$ barman check all --nagios
BARMAN OK - Ready to serve the Espresso backup for localhost

Right, that will be just fine for Nagios.

Prerequisites

On Barman server, find out path to barman:

$ which barman
/usr/bin/barman

Also take note of nagios and/or nrpe users (usually the same nagios user).

In order for Nagios checks to work, we need to allow nagios user to execute barman as barman user. So we need to add this line to /etc/sudoers file:

nagios ALL=(barman) NOPASSWD: /usr/bin/barman

To test this setup, first switch user to nagios:

su -s /bin/bash nagios

Then as nagios user execute barman check:

sudo -u barman /usr/bin/barman check all --nagios

You should get something like:

BARMAN OK - Ready to serve the Espresso backup for localhost

or some similar Barman message.

Monitor Barman on local server

This setup assumes that both Nagios and Barman are installed on the same server.

So to set up Nagios first define command in commands.cfg file (located in /usr/local/nagios/etc/objects/ folder in my case):

define command {
command_name check_barman
command_line sudo -u barman /usr/bin/barman check all --nagios
}

Barman won’t play ball if started by nagios user hence we change user from nagios to barman when executing barman check (sudo -u barman).

With that sorted you can define Nagios service to check Barman.

define service {
use local-service ; Name of service template to use
host_name localhost
service_description Barman
check_command check_barman
}

Monitor Barman on remote server

In this setup you have Nagios on one server and Barman on the other. Difference is that barman check will be called via nrpe so we should define our command like so (on Nagios server):

define command {
command_name check_barman
command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c check_barman
}

Service is the same, just change host_name:

define service {
host_name remote_host
service_description Check Barman
check_command check_barman
}

In this case you also need to define command on the client server (with Barman installed). Presumably you already have nrpe installed on Barman server so you can define command by editing /etc/nagios/nrpe.cfg :

command[check_barman]=sudo -u barman /usr/bin/barman check all --nagios

Testing

Check Nagios configuration and reload Nagios for changes to take effect. Navigate to Nagios web UI to check if it’s working:

Here’s an example of error that is reported when PostgreSQL is for example stopped:

$ service postgresql stop
$ barman check all
Server localhost:
PostgreSQL: FAILED
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 20 backups, expected at least 0)
ssh: OK (PostgreSQL server)
systemid coherence: OK (no system Id available)
pg_receivexlog: OK
pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 12.5)
receive-wal running: FAILED (See the Barman log file for more details)
archiver errors: OK

or in Nagios version:

$ barman check all --nagios
BARMAN CRITICAL - server localhost has issues * localhost FAILED: PostgreSQL, pg_receivexlog compatible, receive-wal running
localhost.PostgreSQL: FAILED
localhost.pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 12.5)
localhost.receive-wal running: FAILED (See the Barman log file for more details)

Only first line (marked in blue) will appear in Nagios, but it kinda sums up other lines so it should be enough to get you started on troubleshooting.

Notes

Tested on Ubuntu 20.04, Nagios Core 4.4.6 and Barman 2.12.