Project

General

Profile

Bug #1071

monit process consumes a lot of cpu

Added by inquam about 7 years ago. Updated about 7 years ago.

Status:
Assigned
Priority:
Medium
Assignee:
Category:
-
Target version:
Start date:
08/07/2013
Due date:
% Done:

0%


Description

The monit process on F19 Amahi 7 consumes a lof of cpu cylces making the entire system sluggish.

Running top I frequently see it consuming >50% cpu.

PID   USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
31514 root 20 0 131512 2152 1296 R 57.7 0.0 693:38.85 monit

Not sure what is causing this.

History

#1 Updated by inquam about 7 years ago

I might have stumbled onto the reason monit consumes so much cpu.
Servers created on HDA seems to use monit for the watchdog functionality.
Since it's configured to look at /var/run/servername.pid. An example is

Process Name          = sickbeard
Pid file = /var/run/sickbeard.pid
Monitoring mode = active
Start program = 'systemctl start sickbeard.service' timeout 30 second(s)
Stop program = 'systemctl stop sickbeard.service' timeout 30 second(s)
Existence = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Pid = if changed 1 times within 1 cycle(s) then alert
Ppid = if changed 1 times within 1 cycle(s) then alert

But since most apps are run as the apache user and that user don't have access to write to /var/run a lot of apps instead store their pids under /var/run/servername/servername.pid and the init script gives apache read/write access to /var/run/servername. Some apps even have this build into their "executable" which just takes a pid name and stores it in this location. But because of this monit will never find the pid file for this application which is stored in /var/run/sickbeard/sickbeard.pid. If monit is set to restart an application it "thinks" isn't running it would try to restart these applications all the time although they are actually already running. Apps that would allow multiple instanced to run would actually pose even a bigger problem since they would start new processes all the time until the system eventually dies.

Turning of the watchdog functionality for each server that had the pid file stored in this fashion reduced the load from monit quite a bit. Not sure if it was the fact that I just disabled a few servers or the fact that these servers stored their pid file in this particualr location. But it's worth investigating.

My recommendation is that all apps (if they are decided to run as apache user) should have their pids placed in /var/run/servername/servername.pid as a standard so HDA/monit can be configured to assume this to avoid issues with permission to the folder /var/run. Looking at /var/run on an F19 system it will become obvious that only applications run as root place their pid files in /var/run and all other applications have sub folders that the running user has access to.

Also, I'm not sure how often monit triggers to check the system, that could perhaps also be decreased a bit to reduce a bit of the load.

#2 Updated by cpg about 7 years ago

I think yours are valid points, however, it should be considered a serious bug in monit that it uses so much CPU to merely check on the presence of a PID.

for this, we may want to either:

- file a bug with redhat if it's a packaging issue in monit
- get the latest monit to see if it fixes it and we distribute it

in the mean time, we need to be able to specify the proper PID files for servers. I vaguely remember we had a notation for that!

#3 Updated by inquam about 7 years ago

cpg wrote:

I think yours are valid points, however, it should be considered a serious bug in monit that it uses so much CPU to merely check on the presence of a PID.

for this, we may want to either:

- file a bug with redhat if it's a packaging issue in monit
- get the latest monit to see if it fixes it and we distribute it

in the mean time, we need to be able to specify the proper PID files for servers. I vaguely remember we had a notation for that!

Yes, that is having them in /var/run/server.pid (where server is tha Amahi name of the application server). But since apache doesn't have write access to that folder we have a problem. Where the init script is responsible for writing and deleting the pid file this is not an issue since the init script is run by root. But some ruby apps, if I'm not mistaken, and similar has it so that the app itself is responsible for creating the pid file. And since the application is executed as apache this is not able to write to the folder /var/run. Instead most of these has an init script that creates a folder under /var/run that the user the application is running as has ownership (or at least read and write access). The best solution for now would probably be to change it so that monit is configured to look in /var/run/server/server.pid and update the apps accordingly. The monit part is probably part of the platform so that is probably what could take some time. Not to implement but to push out. The apps (for F19) are probably updated in an hour or so.

I THINK that the issue with monit is two fold. The fact that it checks for the pid is not the entire reason why it's so intensive. The fact that it probably tries to restart the application in question each time it doesn't find the pid probably is.

A workaround would be to modify /var/run to allow apache to write to it, but I don't think that is such a good idea.

#4 Updated by cpg about 7 years ago

We released a fix in hda-platform 7.1.0 for a bug that caused platform-generated monit conf files to have systemctl without a full path.

This in turn caused monit to spin it's wheels because it could not find systemctl (why not have sensible paths?!).

There are probably others issues with monit and cpu use.

Plus it's getting old and cranky enough for us to start planning to replace it.

There is a ruby-friendly system for this called God.

#5 Updated by cpg about 7 years ago

  • Status changed from New to Assigned
  • Assignee set to cpg
  • Priority changed from Normal to Medium

Also available in: Atom