NRPE check_file_age is a filthy liar
Possibly just on Ubuntu. So here’s what I told the NRPE daemon my command was:
command[check_timer]=/usr/lib/nagios/plugins/check_file_age -w 7200 -c 14400 -f /usr/local/playfirst/run/var/TimerService.${HOSTNAME}.alive
and when reading it from nagios_local.cfg, the nrpe daemon reported it as:
/usr/lib/nagios/plugins/check_file_age -w 7200 -c 14400 -f /usr/local/playfirst/run/var/TimerService.${HOSTNAME}.alive$
Oops.
Because then it goes on to substitute the client hostname for ${HOSTNAME} but it turns out the $ at the end of the filename persists and no, there is no file of that name lying around. So that’s kind of a curious artifact of trying to use an environment variable substitution. The variable does substitute but you keep a bonus $ at the end. So then I used a bit of sed trickery so that I push out N different config files, each with the hostname hardcoded.
But it still didn’t work. Because it turns out that daemon runs as nagios, and that file was in a place where only user and group could see it. Rather than turn on the bits for everyone else, I decided to let nrpe run its commands with the sudo prefix option.
Meanwhile, on the server, all of these were flagging file not found. I sure wish the second error state would have been reported as ‘no permission to read file FOO’ and the first, I wish the substitution didn’t have that weird side effect.
Yeah, I could write patch or patches for these and submit upstream but I’ve had bad luck providing patches to Debian and Ubuntu in the past. I have to assume it’s something about me at this point. But if anyone wants to patch something, these are things which bugged me this last week and made my job harder.
Tags: nagios, nrpe, system monitoring

November 11th, 2010 at 7:11 pm
Yes, I got same problem when I using check_file_age on a remote server to check a daily backup file.
My configure as below:
Remote server:command[check_db_backup_file]=/usr/local/nagios/libexec/check_file_age -w 86400 -c 93600 -f /u01/backup/db_demo_$(date +%Y%m%d.dmp).gz
Monitoring sevrer: ./check_nrpe -H IPofRemoteServer -c check_db_backup_file
Error:
FILE_AGE CRITICAL: File not found – /u01/backup/db_demo_20101112.dmp.gz$
It also a bonus $ at the end…
But I can run command on remote server with error:
/usr/local/nagios/libexec/check_file_age -w 86400 -c 93600 -f /u01/backup/db_demo_$(date +%Y%m%d.dmp).gz
FILE_AGE OK: /u01/backup/db_demo_20101112.dmp.gz is 33861 seconds old and 685676145 bytes
Please let me know if any idea to fix it. Thanks!
January 8th, 2011 at 2:19 pm
That seems consistent with the problem I had in using variable substitution with NRPE. One (probably dumb) thing to try might be not using a variable in your remote server nrpe configuration but instead having a task in crontab which rewrites the line to have the right filename on a daily basis. Something like
#!/bin/bash
# update_nrpe
TODAY=$(date +%Y%m%d)
### WARNING, completely untested, just an idea, do not use in production environment without testing.
sed -e ‘s/db_demo_(.*)/${TODAY}/’ /etc/nagios/nrpe_local.cfg > /etc/nagios/nrpe_local.cfg
would remove the variable substitution from the nrpe processing.