NRPE check_file_age is a filthy liar

Possibly just on Ubuntu.  So here’s what I told the NRPE daemon my command was:

command[check_timer]=/usr/lib/nagios/plugins/check_file_age -w 7200 -c 14400 -f /usr/local/playfirst/run/var/TimerService.${HOSTNAME}.alive

and when reading it from nagios_local.cfg, the nrpe daemon reported it as:

/usr/lib/nagios/plugins/check_file_age -w 7200 -c 14400 -f /usr/local/playfirst/run/var/TimerService.${HOSTNAME}.alive$

Oops.

Because then it goes on to substitute the client hostname for ${HOSTNAME} but it turns out the $ at the end of the filename persists and no, there is no file of that name lying around.  So that’s kind of a curious artifact of trying to use an environment variable substitution.  The variable does substitute but you keep a bonus $ at the end.  So then I used a bit of sed trickery so that I push out N different config files, each with the hostname hardcoded.

But it still didn’t work.  Because it turns out that daemon runs as nagios, and that file was in a place where only user and group could see it.  Rather than turn on the bits for everyone else, I decided to let nrpe run  its commands with the sudo prefix option.

Meanwhile, on the server, all of these were flagging file not found.  I sure wish the second error state would have been reported as ‘no permission to read file FOO’ and the first, I wish the substitution didn’t have that weird side effect.

Yeah, I could write patch or patches for these and submit upstream but I’ve had bad luck providing patches to Debian and Ubuntu in the past.  I have to assume it’s something about me at this point.  But if anyone wants to patch something, these are things which bugged me this last week and made my job harder.

Tags: , ,

2 Responses to “NRPE check_file_age is a filthy liar”

  1. maxidea Says:

    Yes, I got same problem when I using check_file_age on a remote server to check a daily backup file.
    My configure as below:
    Remote server:command[check_db_backup_file]=/usr/local/nagios/libexec/check_file_age -w 86400 -c 93600 -f /u01/backup/db_demo_$(date +%Y%m%d.dmp).gz

    Monitoring sevrer: ./check_nrpe -H IPofRemoteServer -c check_db_backup_file

    Error:
    FILE_AGE CRITICAL: File not found – /u01/backup/db_demo_20101112.dmp.gz$

    It also a bonus $ at the end…

    But I can run command on remote server with error:
    /usr/local/nagios/libexec/check_file_age -w 86400 -c 93600 -f /u01/backup/db_demo_$(date +%Y%m%d.dmp).gz
    FILE_AGE OK: /u01/backup/db_demo_20101112.dmp.gz is 33861 seconds old and 685676145 bytes

    Please let me know if any idea to fix it. Thanks!

  2. binder Says:

    That seems consistent with the problem I had in using variable substitution with NRPE. One (probably dumb) thing to try might be not using a variable in your remote server nrpe configuration but instead having a task in crontab which rewrites the line to have the right filename on a daily basis. Something like

    #!/bin/bash
    # update_nrpe

    TODAY=$(date +%Y%m%d)

    ### WARNING, completely untested, just an idea, do not use in production environment without testing.
    sed -e ‘s/db_demo_(.*)/${TODAY}/’ /etc/nagios/nrpe_local.cfg > /etc/nagios/nrpe_local.cfg

    would remove the variable substitution from the nrpe processing.

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...