Sunday, April 16, 2017

Painless Remote Access to monit...NOT!


Installing monit Was Easy.  Configuring & Setting Up Remote Access To It Wasn't...


Help!  Help!  My Server is Down...Or Is It?

For the past while, an "interesting" problem has challenged one of my servers.  Every once in a while, at random intervals, the machine appeared to crash.  There was no rhyme or reason to the crash - it just stopped being availableInvestigating, I discovered that it hadn't crashed, instead the machine had become totally bogged down and was operating at a near-halt.  Instead, I observed that it was experiencing sky-high CPU utilization and fully allocated memory.  You could log in, but the process was agonizingly slow (5+ minutes).

Seeing as this machine was using the LAMP stack on a cloud-based Virtual Private Server (VPS), there were many potential fault-points. I looked into the situation carefully, but was never able to really pin down the reason for the failure.  So I just scheduled a restart of the suspected systems with crontab -e and a few scripts and moved on.  No longer.
 

A Need For Better Configuration Monitoring & Management

The potential reasons for this problem were literally myriad.  Because of the way this server was provisioned, there were many candidate problem areas throughout the implication solution stack involved.  As a frame of reference, there may have been problems with:
 
1) The Hardware Layer
2) The Virtualization Layer
3) The Operating System Layer
4) The Application Layer

Seeking a richer troubleshooting model than the above, I eventually decided to use the OSI model to really go into troubleshooting the problem.

Troubleshooting Via The OSI Reference Model

The OSI model has been around for a long, long time.  It proposes that modern computer systems are composed of the following seven layers:

(https://en.wikipedia.org/wiki/OSI_model)

To be able to eliminate as many potential sources of the "freezing" as possible, I needed a tool to help me gather information on those layers; as much information as possible.  It would also be nice if that tool could tell me if my server was heading towards trouble, and maybe even help to manage the trouble when it arrived.  For those needs, monit seemed to fit the bill.

monit:  A Server Monitoring & Troubleshooting System


According to its own man page, monit is:

       monit is a utility for managing and monitoring processes, files, direc-
       tories and devices on a Unix system. Monit conducts automatic mainte-
       nance and repair and can execute meaningful causal actions in error
       situations. E.g. monit can start a process if it does not run, restart
       a process if it does not respond and stop a process if it uses too much
       resources. You may use monit to monitor files, directories and devices
       for changes, such as timestamps changes, checksum changes or size
       changes.


Installing monit

Installing monit was pretty easy on Centos 7.  I did it with the following command:

[root@vps2]# yum install monit

Configuring monit

The monit configuration file is located at /etc/monit.conf.  Working with the file was "OK" but I think the developers could explain the innards of the file a little better - a little bit of documentation (with examples) goes a long way in linux, as the LDP (of which I was a charter member) proved.  Many of the supplied values in monit.conf (checksum, file paths, logging strategy) were wrong for CentOS 7, so I had to change them - turning what could have been a 20 minute exercise into a half-day exercise.

Anyway, after some fiddling around, I discovered that the following configuration worked for my use case, which was to monitor the overall machine, as well as the web server status from a remote location.  My settings are highlighted, settings you will need to customize for your implementation are <italics bold>:

###############################################################################
## Monit control file
###############################################################################
##
## Comments begin with a '#' and extend through the end of the line. Keywords
## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.
##
## Below you will find examples of some frequently used statements. For
## information about the control file and a complete list of statements and
## options, please have a look in the Monit manual.
##
##
###############################################################################
## Global section
###############################################################################
##
## Start Monit in the background (run as a daemon):
#
set daemon  60              # check services at 1-minute intervals
#   with start delay 240    # optional: delay the first check by 4-minutes (by
#                           # default Monit check immediately after Monit start)
#
#
## Set syslog logging with the 'daemon' facility. If the FACILITY option is
## omitted, Monit will use 'user' facility by default. If you want to log to
## a standalone log file instead, specify the full path to the log file
#
# set logfile syslog facility log_daemon
set logfile /var/log/monit.log
#
#
## Set the location of the Monit id file which stores the unique id for the
## Monit instance. The id is generated and stored on first Monit start. By
## default the file is placed in $HOME/.monit.id.
#
set idfile /var/monit/id
#
## Set the location of the Monit state file which saves monitoring states
## on each cycle. By default the file is placed in $HOME/.monit.state. If
## the state file is stored on a persistent filesystem, Monit will recover
## the monitoring state across reboots. If it is on temporary filesystem, the
## state will be lost on reboot which may be convenient in some situations.
#
set statefile /var/monit/state
#
## Set the list of mail servers for alert delivery. Multiple servers may be
## specified using a comma separator. If the first mail server fails, Monit
# will use the second mail server in the list and so on. By default Monit uses
# port 25 - it is possible to override this with the PORT option.
#
set mailserver <you@mailserver.domain>       # primary mailserver
#                backup.bar.baz port 10025,  # backup mailserver on port 10025
                 localhost                   # fallback relay
#
#
## By default Monit will drop alert events if no mail servers are available.
## If you want to keep the alerts for later delivery retry, you can use the
## EVENTQUEUE statement. The base directory where undelivered alerts will be
## stored is specified by the BASEDIR option. You can limit the maximal queue
## size using the SLOTS option (if omitted, the queue is limited by space
## available in the back end filesystem).
#
# set eventqueue
#     basedir /var/monit  # set the base directory where events will be stored
#     slots 100           # optionally limit the queue size
#
#
## Send status and events to M/Monit (for more informations about M/Monit
## see http://mmonit.com/). By default Monit registers credentials with
## M/Monit so M/Monit can smoothly communicate back to Monit and you don't
## have to register Monit credentials manually in M/Monit. It is possible to
## disable credential registration using the commented out option below.
## Though, if safety is a concern we recommend instead using https when
## communicating with M/Monit and send credentials encrypted.
#
# set mmonit http://monit:monit@192.168.1.10:8080/collector
#     # and register without credentials     # Don't register credentials
#
#
## Monit by default uses the following format for alerts if the the mail-format
## statement is missing::
## --8<--
## set mail-format {
##      from: monit@$HOST
##   subject: monit alert --  $EVENT $SERVICE
##   message: $EVENT Service $SERVICE
##                 Date:        $DATE
##                 Action:      $ACTION
##                 Host:        $HOST
##                 Description: $DESCRIPTION
##
##            Your faithful employee,
##            Monit
## }
## --8<--
##
## You can override this message format or parts of it, such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded at runtime. For example, to override the sender, use:
#
# set mail-format { from: monit@foo.bar }
#
#
## You can set alert recipients whom will receive alerts if/when a
## service defined in this file has errors. Alerts may be restricted on
## events by using a filter as in the second example below.
#
set alert <you@mailserver.domain>                # receive all alerts
# set alert manager@foo.bar only on { timeout }  # receive just service-
#                                                # timeout alert
#
#
## Monit has an embedded web server which can be used to view status of
## services monitored and manage services from a web interface. See the
## Monit Wiki if you want to enable SSL for the web server.
#
set httpd port 2812                  # bind internal webserver to specified port
# use address <URL>                  # bind webserver to specific IP or URL
                                     # (commented binds to all interfaces)
 allow 0.0.0.0/0.0.0.0               # allow any machine to connect to the server
 allow <user>:<pass>                 # require specified user/pass
#    allow @monit                    # allow users of group 'monit' to connect (rw)
#    allow @users readonly           # allow users of group 'users' to connect readonly

###############################################################################
## Services
###############################################################################
##
## Check general system resources such as load average, cpu and memory
## usage. Each test specifies a resource, conditions and the action to be
## performed should a test fail.
#
check system <IP or URL>
     if loadavg (1min) > 4 then alert
     if loadavg (5min) > 2 then alert
     if memory usage > 75% then alert
     if swap usage > 25% then alert
     if cpu usage (user) > 70% then alert
     if cpu usage (system) > 30% then alert
     if cpu usage (wait) > 20% then alert#
#
## Check if a file exists, checksum, permissions, uid and gid. In addition
## to alert recipients in the global section, customized alert can be sent to
## additional recipients by specifying a local alert handler. The service may
## be grouped using the GROUP option. More than one group can be specified by
## repeating the 'group name' statement.
#
  check file apache_bin with path /usr/sbin/httpd
#    if failed checksum and expect the sum <checksum> then unmonitor
     if failed permission 755 then unmonitor
     if failed uid root then unmonitor
     if failed gid root then unmonitor
     alert graham.leach@yougrow.net on { checksum, permission, uid, gid } with the mail-format { subject: Alarm! }
     group server
#
#
## Check that a process is running, in this case Apache, and that it respond
## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,
## and number of children. If the process is not running, Monit will restart
## it by default. In case the service is restarted very often and the
## problem remains, it is possible to disable monitoring using the TIMEOUT
## statement. This service depends on another service (apache_bin) which
## is defined above.
#
  check process apache with pidfile /var/run/httpd/httpd.pid
    start program = "/etc/init.d/httpd start" with timeout 60 seconds
    stop program  = "/etc/init.d/httpd stop"
    if cpu > 60% for 2 cycles then alert
    if cpu > 80% for 5 cycles then restart
    if totalmem > 1800.0 MB for 5 cycles then restart
    if children > 250 then restart
    if loadavg(5min) greater than 10 for 8 cycles then restart#    if failed host www.tildeslash.com port 80 protocol http and request "/somefile.html" then restart
#    if failed port 443 type tcpssl protocol http with timeout 15 seconds then restart
#    if 3 restarts within 5 cycles then timeout
    depends on apache_bin
    group server#
#
## Check filesystem permissions, uid, gid, space and inode usage. Other services,
## such as databases, may depend on this resource and an automatically graceful
## stop may be cascaded to them before the filesystem will become full and data
## lost.
#
#  check filesystem datafs with path /dev/sdb1
#    start program  = "/bin/mount /data"
#    stop program  = "/bin/umount /data"
#    if failed permission 660 then unmonitor
#    if failed uid root then unmonitor
#    if failed gid disk then unmonitor
#    if space usage > 80% for 5 times within 15 cycles then alert
#    if space usage > 99% then stop
#    if inode usage > 30000 then alert
#    if inode usage > 99% then stop
#    group server
#
#
## Check a file's timestamp. In this example, we test if a file is older
## than 15 minutes and assume something is wrong if its not updated. Also,
## if the file size exceed a given limit, execute a script
#
#  check file database with path /data/mydatabase.db
#    if failed permission 700 then alert
#    if failed uid data then alert
#    if failed gid data then alert
#    if timestamp > 15 minutes then alert
#    if size > 100 MB then exec "/my/cleanup/script" as uid dba and gid dba
#
#
## Check directory permission, uid and gid.  An event is triggered if the
## directory does not belong to the user with uid 0 and gid 0.  In addition,
## the permissions have to match the octal description of 755 (see chmod(1)).
#
#  check directory bin with path /bin
#    if failed permission 755 then unmonitor
#    if failed uid 0 then unmonitor
#    if failed gid 0 then unmonitor
#
#
## Check a remote host availability by issuing a ping test and check the
## content of a response from a web server. Up to three pings are sent and
## connection to a port and an application level network check is performed.
#
#  check host myserver with address 192.168.1.1
#    if failed icmp type echo count 3 with timeout 3 seconds then alert
#    if failed port 3306 protocol mysql with timeout 15 seconds then alert
#    if failed url http://user:password@www.foo.bar:8080/?querystring
#       and content == 'action="j_security_check"'
#       then alert
#
#
###############################################################################
## Includes
###############################################################################
##
## It is possible to include additional configuration parts from other files or
## directories.
#
include /etc/monit.d/* 
# 
 

Testing monit Locally

Seeing as monit runs its own web server, testing from the local machine was pretty straightforward.  I simply fired up links and pointed it at the loopback URL:

[root@vps2]# links 127.0.0.1:2812

And here's what I saw:


So far so good!


Testing monit Remotely

But the problems started when I tried to connect to monit remotely. Here's what I saw when I tried to access it from my PC:




Help! Help!  I Am Unable To Access monit Remotely

Perplexed, I decided to see if monit was listening on the right interface for public access, which would be its public ip address, not the loopback address. To check if it was indeed listening to the right interface (ethN), I used the netstat command with a few parameters thrown in.  Here's what I saw:


[root@vps2]# netstat -tldpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1351/sshd
tcp        0      0 0.0.0.0:25                  0.0.0.0:*                   LISTEN      1586/master
tcp        0      0 0.0.0.0:2812                0.0.0.0:*                   LISTEN      24250/monit
tcp        0      0 :::80                       :::*                        LISTEN      16000/httpd
tcp        0      0 :::443                      :::*                        LISTEN      16000/httpd 


A Quick Rundown Of Popular Linux Ports/Servers:
Here's a quick rundown of what ports/servers were open on this machine:

Port 22:    Secure Shell (sshd, to enable remote log in)
Port 25:    Mail (postfix, to enable sending & receiving mail)
Port 2812:  Server Status (monit, to help me diagnose problems & maintain uptime)
Port 80:    HTTP (apache, main purpose of server)
Port 443:   HTTP/S (apache, main purpose of server)


The netstat output indicated that monit was listening on ALL interfaces, so I shouldn't have had a problem accessing it...but I was.  Something was getting in the way.  That something was iptables, which needed to be told about monit.


What is iptables?


According to its own man page, iptables is:

       Iptables  is  used  to  set  up, maintain, and inspect the tables of IP
       packet filter rules in the Linux kernel.  Several different tables  may
       be  defined.   Each  table contains a number of built-in chains and may
       also contain user-defined chains.


Configuring iptables for Remote Access to monit

Opening a port in iptables is usually a pretty trivial affair.  I have done it many times.  So for monit, I just entered the following command at the CLI to tell iptables to allow remote access to the monit port specified in /etc/monit.conf:


#iptables -A INPUT -p tcp -m tcp --dport 2812 -j ACCEPT


But I still got this on the PC:



As it turns out, the fix was more subtle than I originally thought.  Due to the way that iptables processes its rules, this particular rule needed to appear earlier in the iptables rule set for things to work right, simply appending it to the existing rule set didn't work. So I ended up manually editing the iptables configuration file, located at /etc/sysconfig/iptables, and adding the rule manually as early as possible.

# Generated by iptables-save v1.4.7 on Sat Apr 15 09:12:50 2017
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [30125:21061390]
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 2812 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 80 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 25 -m state --state NEW,ESTABLISHED -j ACCEPT
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A OUTPUT -o eth0 -p tcp -m tcp --sport 25 -m state --state ESTABLISHED -j ACCEPT
COMMIT
# Completed on Sat Apr 15 09:12:50 2017 

Configuration completed, I restarted iptables:



[root@vps2]# service iptables restart
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
iptables: Applying firewall rules:                         [  OK  ]
[root@server sysconfig]#

I then double-checked the iptables rule set, and monit appeared where it should, only I had to ignore the fact that linux identifies monit as atmtcp, which is a legacy protocol that almost nobody uses (or even knows about) any more.

[root@vps2]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ndmp
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:atmtcp
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:http
ACCEPT     icmp --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere
ACCEPT     tcp  --  anywhere             anywhere            state NEW tcp dpt:ssh
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:https
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:smtp state NEW,ESTABLISHED

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
REJECT     all  --  anywhere             anywhere            reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
ACCEPT     tcp  --  anywhere             anywhere            tcp spt:smtp state ESTABLISHED

Sadly, the developers of monit chose a previously mapped port (atmtcp).  This can lead to terrible confusion.  If, like me, you really don't like your servers being mis-identified, you can always change the port assignment for monit in the /etc/monit.conf file, or change the port mapping in /etc/services.  I have no intention of ever implementing atmtcp on this machine, so that's what I did:


 

Now the iptables output makes a bit more sense:

[root@vps2]# iptables -L | grep monit
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:monit

A Successful Remote Connection to monit

After reconfiguring iptables, I refreshed my browser (by pressing CTRL-F5), and here's what I saw:


Finally, it's working!  Now, at least the initial stages of troubleshooting this problem have been solved.  In an upcoming article, I will discuss the root cause(s) of the "freezing" problem, and how I solved that situation as well...(HINT:  It's all about the logfile located at /var/log/monit.log), which should look like this:


[HKT Apr 15 08:10:34] info     : Starting monit daemon with http interface at [*:2812]
[HKT Apr 15 08:10:34] info     : Starting monit HTTP server at [*:2812]
[HKT Apr 15 08:10:34] info     : monit HTTP server started
[HKT Apr 15 08:10:34] info     : '<domain>' Monit started

not this:


[HKT Apr 14 20:39:36] error    : 'apache' total mem amount of 1027608kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 20:40:36] error    : 'apache' total mem amount of 1027608kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 20:41:36] error    : 'apache' total mem amount of 1128548kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 20:42:36] error    : 'apache' total mem amount of 1135176kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 20:43:36] error    : 'apache' total mem amount of 1237728kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 20:43:41] info     : 'apache' trying to restart
[HKT Apr 14 20:43:41] info     : 'apache' stop: /etc/init.d/httpd
[HKT Apr 14 20:43:41] info     : 'apache' start: /etc/init.d/httpd
[HKT Apr 14 20:44:42] info     : 'apache' 'apache' total mem amount check succeeded [current total mem amount=178220kB]
[HKT Apr 14 21:48:47] error    : 'apache' total mem amount of 1231992kB matches resource limit [total mem amount>1024003kB]
[HKT Apr 14 21:48:52] info     : 'apache' trying to restart
[HKT Apr 14 21:48:52] info     : 'apache' stop: /etc/init.d/httpd
[HKT Apr 14 21:48:53] info     : 'apache' start: /etc/init.d/httpd
[HKT Apr 14 21:49:53] error    : 'apache' process is not running
[HKT Apr 14 21:49:58] info     : 'apache' trying to restart
[HKT Apr 14 21:49:58] info     : 'apache' start: /etc/init.d/httpd
[HKT Apr 14 21:50:58] error    : 'apache' failed to start
[HKT Apr 14 21:52:03] error    : 'apache' process is not running
[HKT Apr 14 21:52:03] info     : 'apache' trying to restart
[HKT Apr 14 21:52:03] info     : 'apache' start: /etc/init.d/httpd
[HKT Apr 14 21:52:56] info     : 'apache' started
[HKT Apr 14 21:54:01] error    : 'apache' service restarted 3 times within 3 cycles(s) - unmonitor
[HKT Apr 14 21:57:08] info     : Shutting down monit HTTP server
[HKT Apr 14 21:57:08] info     : monit HTTP server stopped
[HKT Apr 14 21:57:08] info     : monit daemon with pid [13452] killed

finis (for now)

REFERENCES


https://askubuntu.com/questions/640150/custom-port-names-for-netstat

https://crm.vpscheap.net/knowledgebase.php?action=displayarticle&id=29

https://en.wikipedia.org/wiki/OSI_model

https://www.centos.org/forums/viewtopic.php?t=9059



No comments:

Post a Comment