Using Nagios to monitor Barman

Introduction

A word about Barman. As stated on their web site https://www.pgbarman.org/ :

Allows your company to implement disaster recovery solutions for PostgreSQL databases with high requirements of business continuity.

Taking an online hot backup of PostgreSQL is now as easy as ordering a good espresso coffee.

In short Barman allows easy management of PostgreSQL backups, incremental backups, point in time recovery and many other nice things. It might be prudent to monitor it’s health.

Happily, Barman check function has built-in --nagios flag that will output result in Nagios friendly way.

Barman check output usually looks like this:

$ barman check all
Server localhost:
PostgreSQL: OK
superuser or standard user with backup privileges: OK
PostgreSQL streaming: OK
wal_level: OK
replication slot: OK
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 20 backups, expected at least 0)
ssh: OK (PostgreSQL server)
systemid coherence: OK
pg_receivexlog: OK
pg_receivexlog compatible: OK
receive-wal running: OK
archive_mode: OK
archive_command: OK
continuous archiving: OK
archiver errors: OK

Now with --nagios flag:

$ barman check all --nagios
BARMAN OK - Ready to serve the Espresso backup for localhost

Right, that will be just fine for Nagios.

Prerequisites

On Barman server, find out path to barman:

$ which barman
/usr/bin/barman

Also take note of nagios and/or nrpe users (usually the same nagios user).

In order for Nagios checks to work, we need to allow nagios user to execute barman as barman user. So we need to add this line to /etc/sudoers file:

nagios ALL=(barman) NOPASSWD: /usr/bin/barman

To test this setup, first switch user to nagios:

su -s /bin/bash nagios

Then as nagios user execute barman check:

sudo -u barman /usr/bin/barman check all --nagios

You should get something like:

BARMAN OK - Ready to serve the Espresso backup for localhost

or some similar Barman message.

Monitor Barman on local server

This setup assumes that both Nagios and Barman are installed on the same server.

So to set up Nagios first define command in commands.cfg file (located in /usr/local/nagios/etc/objects/ folder in my case):

define command {
command_name check_barman
command_line sudo -u barman /usr/bin/barman check all --nagios
}

Barman won’t play ball if started by nagios user hence we change user from nagios to barman when executing barman check (sudo -u barman).

With that sorted you can define Nagios service to check Barman.

define service {
use local-service ; Name of service template to use
host_name localhost
service_description Barman
check_command check_barman
}

Monitor Barman on remote server

In this setup you have Nagios on one server and Barman on the other. Difference is that barman check will be called via nrpe so we should define our command like so (on Nagios server):

define command {
command_name check_barman
command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c check_barman
}

Service is the same, just change host_name:

define service {
host_name remote_host
service_description Check Barman
check_command check_barman
}

In this case you also need to define command on the client server (with Barman installed). Presumably you already have nrpe installed on Barman server so you can define command by editing /etc/nagios/nrpe.cfg :

command[check_barman]=sudo -u barman /usr/bin/barman check all --nagios

Testing

Check Nagios configuration and reload Nagios for changes to take effect. Navigate to Nagios web UI to check if it’s working:

Here’s an example of error that is reported when PostgreSQL is for example stopped:

$ service postgresql stop
$ barman check all
Server localhost:
PostgreSQL: FAILED
directories: OK
retention policy settings: OK
backup maximum age: OK (no last_backup_maximum_age provided)
compression settings: OK
failed backups: OK (there are 0 failed backups)
minimum redundancy requirements: OK (have 20 backups, expected at least 0)
ssh: OK (PostgreSQL server)
systemid coherence: OK (no system Id available)
pg_receivexlog: OK
pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 12.5)
receive-wal running: FAILED (See the Barman log file for more details)
archiver errors: OK

or in Nagios version:

$ barman check all --nagios
BARMAN CRITICAL - server localhost has issues * localhost FAILED: PostgreSQL, pg_receivexlog compatible, receive-wal running
localhost.PostgreSQL: FAILED
localhost.pg_receivexlog compatible: FAILED (PostgreSQL version: None, pg_receivexlog version: 12.5)
localhost.receive-wal running: FAILED (See the Barman log file for more details)

Only first line (marked in blue) will appear in Nagios, but it kinda sums up other lines so it should be enough to get you started on troubleshooting.

Notes

Tested on Ubuntu 20.04, Nagios Core 4.4.6 and Barman 2.12.

Setting up Motion to send images to email

Motion is a program that monitors the video signal from one or more cameras and is able to detect if a significant part of the picture has changed; in other words, it can detect motion.

Source: http://www.lavrsen.dk/foswiki/bin/view/Motion

Installation is quite simple and straight foward; once you get your webcam drivers up and running, just follow these steps: http://www.lavrsen.dk/foswiki/bin/view/Motion/MotionGuideInstallation.

All in all it works well out of the box, you can have a live stream from your webcams, software detects motion and enables you to execute a piece of code or external command when it happens. It supports multiple cameras and has a bunch of options that can are in config file, usually located in /etc/motion/motion.conf.

So, you have your motion daemon that works in background and analises frames as they come from webcam. If difference between two consecutive frames (after noise filtering etc) is above some defined threshold then this is considered as motion or event and it is given it’s own ID. At this point motion starts taking snapshots and create video from these snapshots, stores them in predefined location as long as something is moving. As soon as motion stops, event is finished and snapshots are no longer taken. As software is taking snapshots, for each snapshot it will call any code defined in on_picture_save (defined in motion.conf file).

Now, let’s assume that motion is up and running and you’d like to get email with attached images when something moves in cameras viewport.

One solution would be to put some code on on_picture_save event (which denotes that picture is taken), i.e.:


on_picture_save sendEmail -f <email_from> -t <email_to> -u 'Subject...' -a %f

However, this will be triggered for each snapshot taken resulting in mountain of emails that will be hard to review or make sense of them all. Different approach would be to send all snapshots when event is finished. In order to do that we can hook on_movie_end event.

on_movie_end is triggered when motion stops and video ends as well, this might be a good moment to gather all snapshots that belong to that event and send them to your email address.

As mentioned before snapshots reside in predefined location and filenames follow simple logic, by default: %v-%Y%m%d%H%M%S-snapshot, meaning that event ID (%v) and timestamp (%Y%m%d%H%M%S) are in filename.

Helpfully Motion exposes several conversion specifiers that can be used to identify current event, timestamp, noise levels and all other kind of data. Here’s a list from default config file:


# External Commands, Warnings and Logging:
# You can use conversion specifiers for the on_xxxx commands
# %Y = year, %m = month, %d = date,
# %H = hour, %M = minute, %S = second,
# %v = event, %q = frame number, %t = thread (camera) number,
# %D = changed pixels, %N = noise level,
# %i and %J = width and height of motion area,
# %K and %L = X and Y coordinates of motion center
# %C = value defined by text_event
# %f = filename with full path
# %n = number indicating filetype
# Both %f and %n are only defined for on_picture_save,
# on_movie_start and on_movie_end
# Quotation marks round string are allowed.
############################################################

so this might end up like this:


# Command to be executed when a movie file (.mpg|.avi) is closed. (default: none)
# To give the filename as an argument to a command append it with %f
on_movie_end sendEmail -f <from: email> -t <to: email> -u 'Subject...' -a /tmp/motion/%v-*.jpg

This will send all snapshots from /tmp/motion/<event_ID>-*.jpg as email attachment. However, this autoincrementing ID tends to reset every time daemon is restarted so you’ll get all images that were ever captured with the same ID. Application itself is rather stable but still I extended event ID with year-month-day timestamp that aligns with snapshots filename.


on_movie_end sendEmail -f <from: email> -t <to: email> -u 'Subject...' -a /tmp/motion/%v-%Y%m%d*.jpg

Another aproach might be to periodically move or delete old snapshots…

It might be a bit easier to maintain everything if you keep your code in separate file and have Motion call your script, e.g.:


on_movie_end /home/user/on_movie_end.sh %v %Y%m%d %H:%M:%S

/home/user/on_movie_end.sh:

sendEmail -f <from: email> -t <to: email> -u 'Subject...' -m 'Info:\n'"EventID: $1"'\n'"Datestamp: $2"'\n'"Timestamp: $3" -s <your_smtp_server>:25 -xu <your_smtp_user> -xp <your_smtp_password> -a /tmp/motion/$1-$2*.jpg

(don’t forget to make your script executable e.g. chmod +x /home/user/on_movie_end.sh)

Code above uses custom smtp settings to send email (so you won’t need to configure your machine as email server if you don’t want to) so just adjust it to your settings…