Nagios监控Linux主机

NRPE是Nagios的一个功能扩展,它可在远程Linux/Unix主机上执行插件程序。通过在远程服务器上安装NRPE插件及Nagios插件程序来向Nagios监控平台提供该服务器的本地情况,如CPU负载,内存使用,磁盘使用等。这里将Nagios监控端称为Nagios服务器端,而将远程被监控的主机称为Nagios客户端。

Nagios监控远程主机的方法有多种,其方式包括SNMP,NRPE,SSH,NCSA等。这里介绍其通过NRPE监控远程Linux主机的方式。

NRPE(Nagios Remote Plugin Executor)是用于在远端服务器上运行监测命令的守护进程,它用于让Nagios监控端基于安装的方式触发远端主机上的检测命令,并将检测结果返回给监控端。而其执行的开销远低于基于SSH的检测方式,而且检测过程不需要远程主机上的系统账号信息,其安全性也高于SSH的检测方式。
NRPE有两部分组成

check_nrpe插件:位于监控主机上

nrpe daemon:运行在远程主机上,通常是被监控端agent

注意:nrpe daemon需要Nagios-plugins插件的支持,否则daemon不能做任何监控

当Nagios需要监控某个远程Linux主机的服务或者资源情况时:

首先:Nagios会运行check_nrpe这个插件,告诉它要检查什么;

其次:check_nrpe插件会连接到远程的NRPE daemon,所用的方式是SSL;

然后:NRPE daemon 会运行相应的Nagios插件来执行检查;

最后:NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。

一、被监控端安装Nagios-plugins插件和NRPE
1、添加nagios用户

useradd -s /sbin/nologin nagios

2、安装nagios-plugins,因为NRPE依赖此插件

yum -y install gcc gcc-c++ make openssl openssl-devel   
tar xf nagios-plugins-2.0.3.tar.gz    
cd nagios-plugins-2.0.3   
./configure  --with-nagios-user=nagios --with-nagios-group=nagios   
make all && make install   
    
#注意:如何要监控mysql 需要添加 --with-mysql

3、安装NRPE

tar xf nrpe-2.15.tar.gz    
cd nrpe-2.15   
./configure --with-nrpe-user=nagios  --with-nrpe-group=nagios  --with-nagios-user=nagios --with-nagios-group=nagios  --enable-command-args --enable-ssl   
 make all   
make install-plugin   
make install-daemon   
make install-daemon-config

4、配置NRPE

vim /usr/local/nagios/etc/nrpc.cfg
log_facility=daemon   
pid_file=/var/run/nrpe.pid   
server_port=5666             #监听的端口   
nrpe_user=nagios   
nrpe_group=nagios   
allowed_hosts=192.168.110.157   #允许的地址通常是Nagios服务器端   
     
dont_blame_nrpe=0   
allow_bash_command_substitution=0   
debug=0   
command_timeout=60   
connection_timeout=300   
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10   
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20   
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1  
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z   
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

5、启动NRPE

#以守护进程的方式启动   
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d   
 netstat -tulpn | grep nrpe   
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      22597/nrpe            
tcp        0      0 :::5666                     :::*                        LISTEN      22597/nrpe

有两种方式用于管理nrpe服务,nrpe有两种运行模式:

-i        # Run as a service under inetd or xinetd   
-d        # Run as a standalone daemon

可以为nrpe编写启动脚本,使得nrpe以standard alone方式运行:

 vim /etc/init.d/nrped    
#!/bin/bash   
# chkconfig: 2345 88 12   
# description: NRPE DAEMON   
    
NRPE=/usr/local/nagios/bin/nrpe  
NRPECONF=/usr/local/nagios/etc/nrpe.cfg   
    
case "$1" in 
    start)   
        echo -n "Starting NRPE daemon..." 
        $NRPE -c $NRPECONF -d   
        echo " done." 
        ;;   
    stop)   
        echo -n "Stopping NRPE daemon..." 
        pkill -u nagios nrpe   
        echo " done." 
    ;;   
    restart)   
        $0 stop   
        sleep 2   
        $0 start   
        ;;   
    *)   
        echo "Usage: $0 start|stop|restart" 
        ;;   
    esac  
exit 0   
 chmod +x /etc/init.d/nrped    
 chkconfig --add nrped   
 chkconfig nrped on   
    
service nrped start   
Starting NRPE daemon... done.   
netstat -tnlp   
Active Internet connections (only servers)   
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name      
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1031/sshd             
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      1108/master           
tcp        0      0 0.0.0.0:5666                0.0.0.0:*                   LISTEN      22597/nrpe            
tcp        0      0 :::22                       :::*                        LISTEN      1031/sshd             
tcp        0      0 ::1:25                      :::*                        LISTEN      1108/master           
tcp        0      0 :::5666                     :::*                        LISTEN      22597/nrpe
也可以将此命令加入 /etc/rc.local ,以便开机自动启动。
# echo “/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d” >> /etc/rc.local

二、监控端安装NRPE
1、安装NRPE

# tar xf nrpe-2.15.tar.gz    
# cd nrpe-2.15   
# ./configure    --with-nrpe-user=nagios --with-nrpe-group=nagios  --with-nagios-user=nagios  --with-nagios-group=nagios  --enable-command-args  --enable-ssl   
# make all   
# make install-plugin   
    
#安装完成后,会在Nagios安装目录的libexec下生成check_nrpe的插件   
# cd /usr/local/nagios/libexec/   
# ll -d check_nrpe    
-rwxrwxr-x. 1 nagios nagios 76769 9月  28 08:07 check_nrpe

2、check_nrpe的用法
通过NRPE监控远程Linux主机要使用chech_nrpe插件进行,其语法格式如下:

check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]   
    
# ./check_nrpe -H 192.168.0.81   
NRPE v2.15

3、定义命令

# cd /usr/local/nagios/etc/objects/   
# vim commands.cfg    
#增加到末尾行   
define command{   
        command_name    check_nrpe   
        command_line    $USER1$/check_nrpe -H "$HOSTADDRESS$"  -c "$ARG1$" 
}

4、定义服务

cp localhost.cfg linhost.cfg    
# vim linhost
define host{   
    use     linux-server       
    host_name   linhost    
    alias       My Linux Server      
    address     192.168.110.154  
    }   
define service{   
    use         generic-service   
    host_name       linhost   
    service_description CHECK USER   
    check_command       check_nrpe!check_users   
    }   
define service{   
    use         generic-service   
    host_name       linhost   
    service_description Load   
    check_command       check_nrpe!check_load   
    }   
define service{   
    use         generic-service   
    host_name       linhost   
    service_description SDA1   
    check_command       check_nrpe!check_hda1   
    }   
define service{   
    use         generic-service   
    host_name       linhost   
    service_description Zombie   
    check_command       check_nrpe!check_zombie_procs   
    }   
define service{   
    use         generic-service   
    host_name       linhost   
    service_description Total procs   
    check_command       check_nrpe!check_total_procs   
    }

这里重点说下,Nagios服务端定义服务的命令完全是根据被监控端NRPE中内置的监控命令,如下图所示
5

5、启动所定义的命令和服务

# vim /usr/local/nagios/etc/nagios.cfg    
#增加一行   
cfg_file=/usr/local/nagios/etc/objects/linhost.cfg

6、配置文件语法检查

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0   
Total Errors:   0   
    
Things look okay - No serious problems were detected during the pre-flight check

7、重新启动nagios服务

# service nagios restart

8、打开Nagios web监控页面
1)首先点击【Hosts】查看监控主机状态是否为UP
1
2)其次点击【Services】查看各监控服务的状态是否为OK
2

此条目发表在linux分类目录,贴了标签。将固定链接加入收藏夹。