Project

General

Profile

Bug #454

Zombie (defunct) httpd

Added by gboudreau over 10 years ago. Updated over 10 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/28/2010
Due date:
% Done:

0%


Description

Twice in that many days now, my Apache (httpd) process became unresponsive.
Trying to load any page would result in the browser trying to load indefinitely, never timing out.

I'll reboot now, and see if that helps.

[gb@hda ~]$ service httpd status
httpd (pid  13930) is running...

[gb@hda ~]$ cat /etc/monit.d/httpd.conf 
# file automatically generated by amahi on Sat Jul 11 20:43:15 -0700 2009 - WARNING - any manual edits may be lost!
check process httpd with pidfile /var/run/httpd/httpd.pid
    start program = "/etc/init.d/httpd start" 
    stop  program = "/etc/init.d/httpd stop" 

[gb@hda ~]$ sudo cat /var/run/httpd/httpd.pid
13930

[gb@hda ~]$ ps aux | grep httpd | grep -v grep
root     13930  0.1  0.2 219864  3800 ?        Ss   Feb27   1:51 /usr/sbin/httpd
apache   24814  0.0  0.0      0     0 ?        Z    06:57   0:00 [httpd] <defunct>

[gb@hda ~]$ date
Sun Feb 28 19:09:17 EST 2010

[gb@hda ~]$ sudo tail -50 /var/log/httpd/error_log
[Sun Feb 28 06:56:01 2010] [notice] SIGHUP received.  Attempting to restart
[Sun Feb 28 06:56:02 2010] [notice] Digest: generating secret for digest authentication ...
[Sun Feb 28 06:56:02 2010] [notice] Digest: done
[Sun Feb 28 06:56:02 2010] [notice] Apache/2.2.14 (Unix) DAV/2 Phusion_Passenger/2.2.5 PHP/5.3.1 configured -- resuming normal operations
[Sun Feb 28 06:57:02 2010] [notice] SIGHUP received.  Attempting to restart
[Sun Feb 28 06:57:02 2010] [notice] Digest: generating secret for digest authentication ...
[Sun Feb 28 06:57:02 2010] [notice] Digest: done
[Sun Feb 28 06:57:02 2010] [notice] Apache/2.2.14 (Unix) DAV/2 Phusion_Passenger/2.2.5 PHP/5.3.1 configured -- resuming normal operations
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***
[Sun Feb 28 06:58:04 2010] [warn] child process 24815 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24816 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24817 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24818 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24819 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24820 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24821 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:04 2010] [warn] child process 24822 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24815 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24816 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24817 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24818 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24819 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24820 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24821 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:06 2010] [warn] child process 24822 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24815 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24816 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24817 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24818 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24819 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24820 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24821 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:08 2010] [warn] child process 24822 still did not exit, sending a SIGTERM
[Sun Feb 28 06:58:10 2010] [error] child process 24815 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24816 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24817 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24818 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24819 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24820 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24821 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:10 2010] [error] child process 24822 still did not exit, sending a SIGKILL
[Sun Feb 28 06:58:11 2010] [notice] SIGHUP received.  Attempting to restart
*** glibc detected *** /usr/sbin/httpd: corrupted double-linked list: 0x00007f5bb7937fb0 ***

[gb@hda ~]$ tail /var/hda/web-apps/*/logs/access_log | grep "28/Feb/2010:" 
[gb@hda ~]$ tail /var/hda/web-apps/*/logs/error_log | grep "28/Feb/2010:" 
[gb@hda ~]$ 

[gb@hda ~]$ tail -20 /var/log/monit
[EST Feb 27 17:55:46] error    : 'named' process PID changed to 13619
[EST Feb 27 17:56:16] error    : 'named' process PID changed to 14287
[EST Feb 27 17:56:46] error    : 'named' process PID changed to 14524
[EST Feb 27 17:57:16] info     : 'named' process PID has not changed since last cycle
[EST Feb 28 08:04:21] info     : monit daemon with pid [18605] killed
[EST Feb 28 08:04:21] info     : 'hda.home.com' Monit stopped
[EST Feb 28 08:04:22] info     : 'hda.home.com' Monit started
[EST Feb 28 08:06:05] info     : monit daemon with pid [4493] killed
[EST Feb 28 08:06:05] info     : 'hda.home.com' Monit stopped
[EST Feb 28 08:06:05] info     : 'hda.home.com' Monit started
[EST Feb 28 08:06:05] error    : 'hdactl' process is not running
[EST Feb 28 08:06:05] info     : 'hdactl' trying to restart
[EST Feb 28 08:06:05] info     : 'hdactl' start: /etc/init.d/hdactl
[EST Feb 28 08:06:40] error    : 'smb' process PID changed to 6140
[EST Feb 28 08:06:40] error    : 'named' process PID changed to 6127
[EST Feb 28 08:06:40] info     : 'hdactl' process is running with pid 6037
[EST Feb 28 08:07:10] info     : 'smb' process PID has not changed since last cycle
[EST Feb 28 08:07:10] info     : 'named' process PID has not changed since last cycle

[gb@hda ~]$ uname -a
Linux hda.home.com 2.6.31.12-174.2.3.fc12.x86_64 #1 SMP Mon Jan 18 19:52:07 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

[gb@hda ~]$ rpm -q kernel
kernel-2.6.31.5-127.fc12.x86_64
kernel-2.6.31.12-174.2.3.fc12.x86_64
kernel-2.6.31.12-174.2.22.fc12.x86_64

[gb@hda ~]$ rpm -q glibc
glibc-2.11.1-1.x86_64

[gb@hda ~]$ rpm -q httpd
httpd-2.2.14-1.fc12.x86_64

History

#1 Updated by gboudreau over 10 years ago

I tried to grep in /var/log/* to see if something happened at 06:57 - 06:58, and found nothing.

#2 Updated by cpg over 10 years ago

searching for the SIGTERM error, i see one person saying that having log files longer than 2gb could be a problem http://www.webhostingtalk.com/archive/index.php/t-522818.html

Also, "the error means that the child processes were not terminated, and a SIGTERM was sent.
This error can occur due to various reasons such as running buggy scripts, lack of space/memory, a rewrite loop, etc."

running something "edgy" lately?

#3 Updated by gboudreau over 10 years ago

Checked logs in /var/hda/web-apps/*/logs and /var/logs/httpd/*; nothing bigger than 2MB.

Didn't anything weird recently, I think.

I did setup auto-restart and email notiication when this happens next, so I might be able to see what was running on the httpd server that needed to be SIGTERMed to die...

#4 Updated by repat over 10 years ago

[root@hda ~]# tail -50 /var/log/httpd/error_log
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("phpsysi nfo") failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("ssh") f ailed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("b4rt") failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("webmail ") failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("webmail ") failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("webmin" ) failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("calenda r") failed: Invalid host name
[Mon Mar 01 04:00:03 2010] [error] avahi_entry_group_add_service_strlst("hda") f ailed: Invalid host name
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 0177c2d0 *
    [Mon Mar 01 04:01:05 2010] [warn] child process 12890 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12891 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12892 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12893 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12894 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12895 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12896 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:05 2010] [warn] child process 12898 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12890 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12891 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12892 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12893 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12894 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12895 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12896 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:07 2010] [warn] child process 12898 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12890 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12891 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12892 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12893 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12894 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12895 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12896 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:09 2010] [warn] child process 12898 still did not exit, sendin g a SIGTERM
    [Mon Mar 01 04:01:11 2010] [error] child process 12890 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12891 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12892 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12893 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12894 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12895 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12896 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:11 2010] [error] child process 12898 still did not exit, sendi ng a SIGKILL
    [Mon Mar 01 04:01:12 2010] [notice] SIGHUP received. Attempting to restart
  • glibc detected * /usr/sbin/httpd: corrupted double-linked list: 0x00007f46 017c8e10 *

#5 Updated by gboudreau over 10 years ago

[gb@hda ~]$ mysql -u root -phda -e "select name, identifier, version from apps" hda_production
+-----------------------+------------+------------+
| name                  | identifier | version    |
+-----------------------+------------+------------+
| Greyhole              | r5o2s731iy | 0.5.14     |
| RPM Fusion (Free)     | 14fp5m2ggl | 8.3        |
| RPM Fusion (Non-Free) | gn6lhxp71j | 8.3        |
| DLNA                  | ogiaus92x5 | 1.0.16.2   |
| Agedashi Theme        | woicmpgat7 | 1.2        |
| phpMyAdmin            | fqzjgts9lz | 3.2.5      |
| CrashPlan             | sp2uf8jdkx | 2009-11-05 |
| SABnzbd               | wsl3fc8t0d | 0.5.0      |
| Amahi Web-Apps Proxy  | hstmf5jpiy | 1.0        |
| Transmission          | csjnit42fq | 1.51       |
+-----------------------+------------+------------+

#6 Updated by gboudreau over 10 years ago

I seems it's more widespread than expected...
rampage357 had it on Feb 21, but didn't notice: http://paste.amahi.org/m104e7506
repat had it on Feb 26 for the first time: http://paste.amahi.org/f669a2dce
I had it multiple times, starting Feb 15: http://paste.amahi.org/f7f981e3b
cpg had it too

#7 Updated by gboudreau over 10 years ago

Applications list from repat:

[root@home ~]# mysql -u root -phda -e "select name, identifier, version from apps" hda_production
+-----------------------+------------+------------+
| name                  | identifier | version    |
+-----------------------+------------+------------+
| phpMan                | iam5uzogwq | 1.4.2      |
| Scientific Calculator | 5cxo0y5j4b | 1.3.2      |
| iHama Candy Theme     | l39kp2rguy | 1.5        |
| iHama Theme           | xi2w337034 | 1.8        |
| Agedashi Theme        | woicmpgat7 | 1.2        |
| Home Inventory        | 041zd3lxkn | 1.2.0      |
| phpRecipeBook         | wu3inl696n | 2.40       |
| phpMyAdmin            | fqzjgts9lz | 3.2.3      |
| Webmin                | uicm8zt4hf | 1.500      |
| phpSysInfo            | dek8819qts | 3.0        |
| phpMyBackup           | dj514kza2c | 2.1        |
| RPM Fusion (Free)     | 14fp5m2ggl | 8.3        |
| RPM Fusion (Non-Free) | gn6lhxp71j | 8.3        |
| php Address Book      | dw68cs3k77 | 5.4.6      |
| Munin                 | awlg02c32j | 1.4.3      |
| DLNA                  | ogiaus92x5 | 1.0.16.2   |
| WebChess              | mzokut93gf | 2.0.2      |
| Transmission          | csjnit42fq | 1.51       |
| Amahi Mail System     | e8uapwbzci | 1.75       |
| Clonezilla            | c5rjmqz3jc | 1.2.3-27   |
| OpenVPN ALS           | 2ub5eewxiz | 0.9.1      |
| Joomla                | ku3gjw3pds | 1.5.12     |
| CrashPlan             | sp2uf8jdkx | 2009-11-05 |
+-----------------------+------------+------------+

Also available in: Atom