As I only run a few machines with services that matter on them (notably, my home server and my web server), I don’t need a full-on monitoring service, so instead rely on a system called monit.
Monit is an open source piece of software, used to monitor (see, it’s easily named 😄) and, if possible remediate issues with things it sees wrong.
I use this for watching whether particular services are running (and if not, restart them), for whether the ink in my printer is empty, and to monitor the free space and SMART status on my disks.
Today I noticed that a Docker container had stopped, and I’d not noticed. It wasn’t a big thing, but it gnawed at me, so I had a bit of a look around to see what I can find about this.
I found this blog post, titled “Monitoring Docker Containers with Monit”, from 2014, which suggested monitoring the result from
docker top… and would you believe it, that’s a valid trick 🙂
So, here’s what I’m doing! Each container has it’s own file called
/etc/monit/scripts/check_container_<container-name>.sh which has just this command in it:
#! /bin/bash docker top "<container-name>" exit $?
Note that you replace
<container-name> in both the filename and the script itself with the name of the container – for example, the container
hello-world would be monitored with the file
check_container_hello-world.sh, and the line in that file would say
docker top "hello-world".
I then have a file in
check_container_<container-name> which has this content
CHECK PROGRAM <container-name> WITH PATH /etc/monit/scripts/check_container_<container-name>.sh START PROGRAM = "/usr/bin/docker start <container-name>" STOP PROGRAM = "/usr/bin/docker stop <container-name>" IF status != 0 FOR 3 CYCLES THEN RESTART IF 2 RESTARTS WITHIN 5 CYCLES THEN UNMONITOR
I then ensure that in
/etc/monit/monitrc the line “
include /etc/monit/conf.d/*” is included and not commented out, and then restart monit with
systemctl restart monit.