This procedures describes how to create a remote nrpe plugin for Nagios
- create a nrpe script/command to generate proper format output and exit codes
- Install command/script in /usr/lib64/nagios/plugins on the client
- on client server identify that this command exist (/etc/nagios/nrpe.cfg)
- On the Nagios server,
- add the new command to commands.cfg. For remote exection, the command must be called with check_nrpe.
- add a new service to the services.cfg
- define a new hostgroup in the hostgroups.cfg for the host that at to be monitored.
Problem
Monitoring a process on a client server requires excution locally of the command. For example to test is ldap is running requires the execution of the ldapsearch command which is not installed on the nagios server. Hense an executable plugin must be created that only runs on the client server.
Creating nrpe Script
A nrpe script or program must generate an exit code (return code) and description. The exit codes are as follows:
nrpe Exit codes (return codes)
Numerical
Value |
Service
Status |
Status Description |
0 |
OK |
The plugin was able to check the service and it appeared to be functioning properly |
1 |
Warning |
The plugin was able to check the service, but it appeared to be above some "warning" threshold or did not appear to be working properly |
2 |
Critical |
The plugin detected that either the service was not running or it was above some "critical" threshold |
3 |
Unknown |
Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states. |
- The EXIT CODE is generated by the execution of the script
- The DESCRIPTION is standard output text
Template
This is a template bash shell script. Script should be createdon the client in the /usr/lib64/nagios/plugins directory. Typically with not suffix.
!/bin/bash
#==============================================================================
# Copyright: LogiQwest 2017
# Name: check_template
# OS: Linux
# Location: /usr/lib64/nagios/plugins
# Purpose: Check nrpe template
# License: This template is provide free of charge with no warrenty or support. # User are free to modify and distribute freely with no restrictions.
#------------------------------------------------------------------------------
# Change history:
# Version 1.00:05 Mar 2017 Created
#==============================================================================
VERSION='1.00'
#Association ARGUMENT if required
ARG=$1
<insert shell script commands to verify an operations and create a description>
<and a resuls. For example DESCRIPTION="something" and RESULT="ok|warning|critical|unknown">
case "${RESULTS}" in
'ok')
echo "OK- ${DESCRIPTION}"
exit 0
;;
'warning')
echo "WARNING- ${DESCRIPTION}"
exit 1
;;
'failed')
echo "CRITICAL- ${DESCRIPTION}"
exit 2
;;
'unknown')
echo "UNKNOWN- ${DESCRIPTION}"
exit 3
;;
esac |
Example Script
#!/bin/bash
#==============================================================================
# Copyright: Logiqwest 2017
# Name: check_ldap_replication.sh
# OS: Linux
# Location: /usr/lib64/nagios/plugins
# Purpose: Check ldap replication against master
# License: This script is provide free of charge with no warrenty or support. # User are free to modify and distribute freely with no restrictions.
#------------------------------------------------------------------------------
# Change history:
# Version 1.00:03 Mar 2017 Created by Michael Barto
#==============================================================================
VERSION='1.00'
MASTER_LDAP_SERVER=$1
SLAVE_LDAP_SERVER=`hostname`
RUN_LDAP_REPLICATION_CHECK ()
{
master_contextCSN=`ldapsearch -x -D "cn=Administrator,dc=freightgate,dc=com" -w '<PASSWORD>' -H ldaps://${MASTER_LDAP_SERVER}:636 -P 3 -s base -b "dc=freightgate,dc=com" contextCSN | grep contextCSN | awk '{print $NF}' | grep -v contextCSN`
TEST_RUN=`echo $?`
if [ $TEST_RUN -eq 0 ]; then
slave_contextCSN=`ldapsearch -x -D "cn=Administrator,dc=freightgate,dc=com" -w '<PASSWORD>' -H ldaps://${SLAVE_LDAP_SERVER}:636 -P 3 -s base -b "dc=freightgate,dc=com" contextCSN | grep contextCSN | awk '{print $NF}' | grep -v contextCSN`
if $TEST_RUN -eq 0 ]; then
DESCRIPTION="${SLAVE_LDAP_SERVER} ${slave_contextCSN}"
if [[ "${slave_contextCSN}" != "${master_contextCSN}" ]]; then
RESULTS='failed'
else
RESULTS='ok'
fi
DESCRIPTION="${SLAVE_LDAP_SERVER} ${slave_contextCSN}"
else
RESULTS='unknown'
DESCRIPTION="${SLAVE_LDAP_SERVER} unknown"
fi
else
RESULTS='unknown'
DESCRIPTION="${SLAVE_LDAP_SERVER} unknown"
fi
}
OUTPUT_RESULTS ()
{
case "${RESULTS}" in
'ok')
echo "OK- ${DESCRIPTION} 0"
exit 0
;;
'warning')
echo "WARNING- ${DESCRIPTION} 1"
exit 1
;;
'failed')
echo "CRITICAL- ${DESCRIPTION} 2"
exit 2
;;
'unknown')
echo "UNKNOWN- ${DESCRIPTION} 3"
exit 3
;;
esac
}
# Main ----------------------------------------------------------
RUN_LDAP_REPLICATION_CHECK
OUTPUT_RESULTS |
Edit nrpe.cfg on the Nagios client
Add script plugin command definition to /etc/nagios/nrpe.cfg. For example, the following was added at the end of the file:
.... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... # config file is set to '1'. This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.
#command[check_users]=/usr/lib64/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
#command[check_load]=/usr/lib64/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ #command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ #command[check_procs]=/usr/lib64/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
# To check if ldap replication is working command
[check_ldap_replication]=/usr/lib64/nagios/plugins/check_ldap_replication $ARG1$
|
Restart nrpe on the client.
[root@dbserv14v fg_root]# service xinetd stop Stopping xinetd: [ OK ] [root@dbserv14v fg_root]# service nrpe restart Shutting down Nagios NRPE daemon (nrpe): [ OK ] Starting Nagios NRPE daemon (nrpe): [ OK ] [root@dbserv14v fg_root]# service xinetd start Starting xinetd: [ OK ] [root@dbserv14v fg_root]# |
Checking Script
On the nagios server perform the following test of the new script from the Nagious Server plugin directory (/usr/lib64/nagios/plugins) using check_nrpe
[root@nagios plugins]# /usr/lib64/nagios/plugins/check_nrpe -H ldap001v -c check_ldap_replication ldap101v.freigtgate.com
OK- ldap001v.idc.freightgate.com 20170402011858Z#000000#00#000000 0
[root@nagios plugins]#
|
Enabling the New Command on the Nagios Server
On the Nagios Server:
- Edit commands.cfg
- Edit services.cfg
- Add a hostgroup by editing hostgroup.cfg
commands.cfg
Add the following command type to commands.cfg. For remote execution you must execute the command via the check_nrpe command do execution on the remove server or the command will be excuted on the local nagios server.
# 'check_ldap_replication' command definition
define command{
command_name check_replication_ldap
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_ldap_replication $ARG1$
} |
services.cfg
Add the following command type to services.cfg. Note that parameter values was passed by using "!" to separate the command from the parameter.
define service{
use generic-service
hostgroup_name ldap_slaves
service_description LDAP REPLICATION
check_command check_replication_ldap!ldap101v.logiqwest.com
} |
hostgroup.cfg
Add the following command type to hostgroups.cfg. You may need to defined another host in host.cfg
define hostgroup {
hostgroup_name ldap_slaves
alias LDAP Slaves
members ldap001v,ldap102v
} |
Restart Nagios
Check the updates to the Nagios server with
[root@nagios CHECK_PROGRAMS]# ./check_nagios_configuration.sh
Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Warning: Service 'Check if Amadeus service is up and operational.' on host 'www.amadeus.net'
Checked 430 services.
Checked 99 hosts.
Checked 30 host groups.
Checked 2 service groups.
Checked 2 contacts.
Checked 2 contact groups.
Checked 69 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 99 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 1
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios CHECK_PROGRAMS]#
|
Correct any error and then restart nagios
[root@nagios CHECK_PROGRAMS]# service nagios restart
Running configuration check...
Stopping nagios: done.
Starting nagios: done.
You have new mail in /var/spool/mail/root
[root@nagios CHECK_PROGRAMS]# |
|
Verify with Nagios Web Interface
Verify that is its working in the Nagios Web Interface
Additional Testing
Make one of the client fail by editing the remote script to generate CRITICAL