Project

General

Profile

Bug #769

Millions of files in the spool dir cause memory issues - was Re: Greyhole in a loop

Added by JohnWhitmore about 10 years ago. Updated about 10 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
-
Start date:
02/17/2011
Due date:
% Done:

0%


Description

when looking at the log by typing the following

tail -f /var/log/greyhole.log

I see the following

Feb 17 22:17:51 6 daemon: Optimizing MySQL tables...
Feb 17 22:17:52 6 daemon: Greyhole (version 0.9.0) daemon started.
Feb 17 22:17:52 7 daemon: Loading graveyard backup directories...
Feb 17 22:17:52 7 daemon:   Found 2 directories in the settings table.
Feb 17 22:18:22 6 daemon: Optimizing MySQL tables...
Feb 17 22:18:23 6 daemon: Greyhole (version 0.9.0) daemon started.
Feb 17 22:18:23 7 daemon: Loading graveyard backup directories...
Feb 17 22:18:23 7 daemon:   Found 2 directories in the settings table.
Feb 17 22:18:53 6 daemon: Optimizing MySQL tables...
Feb 17 22:18:54 6 daemon: Greyhole (version 0.9.0) daemon started.
Feb 17 22:18:54 7 daemon: Loading graveyard backup directories...
Feb 17 22:18:54 7 daemon:   Found 2 directories in the settings table.
Feb 17 22:19:24 6 daemon: Optimizing MySQL tables...
Feb 17 22:19:25 6 daemon: Greyhole (version 0.9.0) daemon started.
Feb 17 22:19:25 7 daemon: Loading graveyard backup directories...
Feb 17 22:19:25 7 daemon:   Found 2 directories in the settings table.
Feb 17 22:19:55 6 daemon: Optimizing MySQL tables...
Feb 17 22:19:56 6 daemon: Greyhole (version 0.9.0) daemon started.
Feb 17 22:19:56 7 daemon: Loading graveyard backup directories...
Feb 17 22:19:56 7 daemon:   Found 2 directories in the settings table.

This keeps repeating

I have tried restarting the service by typing

[root@server ~]# service greyhole stop
Shutting down Greyhole:                                    [  OK  ]
[root@server ~]# service greyhole start
Starting Greyhole ...                                      [  OK  ]
[root@server ~]# chkconfig greyhole on
[root@server ~]# PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /usr/bin/greyhole on line 2563

when I look at the line by typing

[root@server ~]# gedit /usr/bin/greyhole

This is the lines in question

    $new_tasks = 0;
    $last_line = FALSE;
    $act = FALSE;
    while (TRUE) {
        if (count(glob("/var/spool/greyhole/*")) === 0) {                                   <===problem line 
            break;
        }

any idea of the cause?

History

#1 Updated by cpg about 10 years ago

Please include the output of:

free

df -h

and

fpaste --sysinfo

(make sure there is nothing private there you do not want to share)

#2 Updated by cpg about 10 years ago

same with

ls /var/spool/greyhole/ | wc -l

#3 Updated by JohnWhitmore about 10 years ago

[root@server ~]# free
             total       used       free     shared    buffers     cached
Mem:       3087192    1139528    1947664          0      92508     305140
-/+ buffers/cache:     741880    2345312
Swap:     10256380          0   10256380
[root@server ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_server-lv_root
                       49G   11G   36G  23% /
tmpfs                 1.5G  100K  1.5G   1% /dev/shm
/dev/sda1             485M   49M  411M  11% /boot
/dev/mapper/vg_server-lv_home
                       39G  257M   37G   1% /home
/dev/sda3             362G  344G  4.0K 100% /var/hda/files
/dev/sdb1             917G  759G  112G  88% /var/hda/files/drives/drive1
/dev/sdc1             917G  761G  110G  88% /var/hda/files/drives/drive2
/dev/sdd1             1.8T  1.6T  112G  94% /var/hda/files/drives/drive3
/dev/sde1             1.8T  1.6T  112G  94% /var/hda/files/drives/drive4
/dev/sdf1             917G  760G  112G  88% /var/hda/files/drives/drive5
/dev/sdg1             917G  759G  112G  88% /var/hda/files/drives/drive6
/dev/sdh1             1.4T  1.2T  112G  92% /var/hda/files/drives/drive7
[root@server ~]#
[root@server ~]# uname -r; rpm -q samba hda-greyhole
2.6.35.11-83.fc14.x86_64
samba-3.5.6-71.fc14.x86_64
hda-greyhole-0.9.0-1.x86_64
[root@server ~]# yum -y install fpaste
Loaded plugins: fastestmirror, langpacks, presto, refresh-packagekit
Adding en_NZ to language list
Loading mirror speeds from cached hostfile
 * fedora: ucmirror.canterbury.ac.nz
 * updates: ucmirror.canterbury.ac.nz
Setting up Install Process
Package fpaste-0.3.5-1.fc14.noarch already installed and latest version
Nothing to do
[root@server ~]# fpaste /etc/samba/smb.conf
Uploading (6.8K)...
http://fpaste.org/gP8n/
[root@server ~]# fpaste /etc/greyhole.conf
Uploading (2.0K)...
http://fpaste.org/jkHo/
[root@server ~]# mount
/dev/mapper/vg_server-lv_root on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda1 on /boot type ext4 (rw)
/dev/mapper/vg_server-lv_home on /home type ext4 (rw)
/dev/sda3 on /var/hda/files type ext4 (rw)
/dev/sdb1 on /var/hda/files/drives/drive1 type ext4 (rw)
/dev/sdc1 on /var/hda/files/drives/drive2 type ext4 (rw)
/dev/sdd1 on /var/hda/files/drives/drive3 type ext4 (rw)
/dev/sde1 on /var/hda/files/drives/drive4 type ext4 (rw)
/dev/sdf1 on /var/hda/files/drives/drive5 type ext4 (rw)
/dev/sdg1 on /var/hda/files/drives/drive6 type ext4 (rw)
/dev/sdh1 on /var/hda/files/drives/drive7 type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
gvfs-fuse-daemon on /home/john/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=john)
[root@server ~]#
[root@server ~]# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6212

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048   205826047   102400000   8e  Linux LVM
/dev/sda3       205826048   976773119   385473536   83  Linux

Disk /dev/sdb: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953523055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xc7379b8b

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048  1953521663   976759808   83  Linux

Disk /dev/sdc: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953523055 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xdd91002a

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048  1953521663   976759808   83  Linux

Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x4ee74ee5

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048  3907028991  1953513472   83  Linux

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa2097688

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048  3907028991  1953513472   83  Linux

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xc7379b85

   Device Boot      Start         End      Blocks   Id  System
/dev/sdg1            2048  1953523711   976760832   83  Linux

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xaf21820c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1            2048  1953523711   976760832   83  Linux

Disk /dev/sdh: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x5c51f64a

   Device Boot      Start         End      Blocks   Id  System
/dev/sdh1            2048  2930276351  1465137152   83  Linux

Disk /dev/dm-0: 52.4 GB, 52445577216 bytes
255 heads, 63 sectors/track, 6376 cylinders, total 102432768 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 10.5 GB, 10502537216 bytes
255 heads, 63 sectors/track, 1276 cylinders, total 20512768 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/dm-1 doesn't contain a valid partition table

Disk /dev/dm-2: 41.9 GB, 41875931136 bytes
255 heads, 63 sectors/track, 5091 cylinders, total 81788928 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/dm-2 doesn't contain a valid partition table
[root@server ~]#
df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_server-lv_root
                       49G   11G   36G  23% /
tmpfs                 1.5G  272K  1.5G   1% /dev/shm
/dev/sda1             485M   49M  411M  11% /boot
/dev/mapper/vg_server-lv_home
                       39G  257M   37G   1% /home
/dev/sda3             362G  344G  4.0K 100% /var/hda/files
/dev/sdb1             917G  759G  112G  88% /var/hda/files/drives/drive1
/dev/sdc1             917G  761G  110G  88% /var/hda/files/drives/drive2
/dev/sdd1             1.8T  1.6T  112G  94% /var/hda/files/drives/drive3
/dev/sde1             1.8T  1.6T  112G  94% /var/hda/files/drives/drive4
/dev/sdf1             917G  760G  112G  88% /var/hda/files/drives/drive5
/dev/sdg1             917G  759G  112G  88% /var/hda/files/drives/drive6
/dev/sdh1             1.4T  1.2T  112G  92% /var/hda/files/drives/drive7
[root@server ~]# greyhole --stats

Greyhole Statistics
===================

Storage Pool
                                    Total -   Used =   Free +  Attic = Possible
  /var/hda/files/gh:                 362G -   343G =     0G +     0G =     0G
  /var/hda/files/drives/drive1/gh:   917G -   759G =   111G +     8G =   120G
  /var/hda/files/drives/drive2/gh:   917G -   761G =   110G +    20G =   129G
  /var/hda/files/drives/drive3/gh:  1834G -  1629G =   111G +     0G =   111G
  /var/hda/files/drives/drive4/gh:  1834G -  1629G =   111G +     3G =   115G
  /var/hda/files/drives/drive5/gh:   917G -   759G =   111G +     0G =   111G
  /var/hda/files/drives/drive6/gh:   917G -   759G =   111G +     0G =   111G
  /var/hda/files/drives/drive7/gh:  1375G -  1194G =   111G +     0G =   111G
  /home/gh:                        df: `/home/gh': No such file or directory
df: no file systems processed
    0G -     0G =   111G +     0G =   111G
                                   ==========================================
  Total:                            9072G -  7834G =   889G +    31G =   920G

[root@server ~]# mysql -u root -phda -e "select * from disk_pool_partitions" hda_production
+----+------------------------------+--------------+---------------------+---------------------+
| id | path                         | minimum_free | created_at          | updated_at          |
+----+------------------------------+--------------+---------------------+---------------------+
|  1 | /var/hda/files               |           10 | 2011-02-11 12:08:04 | 2011-02-11 12:08:04 |
| 12 | /var/hda/files/drives/drive1 |           10 | 2011-02-14 22:32:08 | 2011-02-14 22:32:08 |
| 13 | /var/hda/files/drives/drive2 |           10 | 2011-02-14 22:32:09 | 2011-02-14 22:32:09 |
| 14 | /var/hda/files/drives/drive3 |           10 | 2011-02-14 22:32:10 | 2011-02-14 22:32:10 |
| 15 | /var/hda/files/drives/drive4 |           10 | 2011-02-14 22:32:10 | 2011-02-14 22:32:10 |
| 16 | /var/hda/files/drives/drive5 |           10 | 2011-02-14 22:32:11 | 2011-02-14 22:32:11 |
| 17 | /var/hda/files/drives/drive6 |           10 | 2011-02-14 22:32:12 | 2011-02-14 22:32:12 |
| 18 | /var/hda/files/drives/drive7 |           10 | 2011-02-15 08:33:53 | 2011-02-15 08:33:53 |
| 20 | /home                        |           10 | 2011-02-17 09:29:06 | 2011-02-17 09:29:06 |
+----+------------------------------+--------------+---------------------+---------------------+
[root@server ~]# mysql -u root --phda -e "select concat(path, '/gh') from disk_pool_partitions" hda_production |grep -v 'concat(' |xargs ls -la | fpaste
mysql: unknown option '--phda'
Uploading (1.9K)...
http://fpaste.org/qaJs/
[root@server ~]# greyhole --view-queue

Greyhole Work Queue Statistics
==============================

This table gives you the number of pending operations queued for the Greyhole daemon, per share.

              Write   Delete  Rename
DVDs          0       0       0      
Files         0       0       0      
JohnsDropbox  0       0       0      
Movies        0       0       0      
Music         0       0       0      
Pictures      0       0       0      
Software      0       0       0      
TV Series     1322    0       0      
Videos        0       0       0      
Websites      0       0       0      
Wii           0       0       0      
Xbox          44337   44338   0      
============
Total         45659 + 44338 + 0     = 89997

The following is the number of pending operations that the Greyhole daemon still needs to parse.
Until it does, the nature of those operations is unknown.
Spooled operations that have been parsed will be listed above and disappear from the count below.

Spooled       1522536

[root@server ~]# 

#4 Updated by JohnWhitmore about 10 years ago

Corrected info after mistype in terminal

[root@server ~]# ls /var/spool/greyhole/ | wc -l
1522536
[root@server ~]#

Note I had rebooted a couple of times since I first noted the issue

#5 Updated by JohnWhitmore about 10 years ago

[root@server ~]# fpaste --sysinfo
Gathering system info.........................
Uploading (17.8K)...
http://fpaste.org/5zjX/

#6 Updated by gboudreau about 10 years ago

/home/gh: df: `/home/gh': No such file or directory

This is a problem.
Try to uncheck then re-check that partition in the Amahi Dashboard Storage Pool page.
That's what should create that directory.
If Greyhole --stats still gives an error, try to leave that partition unchecked, and see if the loop continues.

Also, try:
service greyhole stop
service greyhole start
tail -f /var/log/greyhole.log

Wait for Greyhole to give some kind of error, or restart, and paste the whole thing here (all the commands you typed, and the result up to the error).

Also, I guess the number of spooled tasks could be a problem (1,522,536)...
With Greyhole running in a loop, see if the numbers given by "greyhole --view-queue" change. Normally, the spooled number should decrease, and the numbers above that should increase.

#7 Updated by cpg about 10 years ago

for edge (?) cases like this would it be better to do something like:

ls /var/spool/greyhole | wc -l

instead of globbing ALL the files in a dir?

count(glob("/var/spool/greyhole/*"))

which would hit the php limits.

or even better yet, use low level routines to enumerate a dir and if there is more than one entry return right away, which is what the check is doing?

for cases like these (which over time will tend to become more frequent), it's quite inefficient and a bottleneck to glob all files just to see if there is more than one :)

#8 Updated by gboudreau about 10 years ago

Yes, that would be better, but still take a while to just ls.
I think find | head -1 would be faster when the directory is full, since that line just checks if there's no files in there.
But then, the code just below that does another check for that... So I'm not sure why we'd need two checks.

JohnWhitmore: Remove those 3 lines from /usr/bin/greyhole:

if (count(glob("/var/spool/greyhole/*")) === 0) {
break;
}

And restart the service. That should fix your problem.
Let us know.

#9 Updated by JohnWhitmore about 10 years ago

gboudreau wrote:

/home/gh: df: `/home/gh': No such file or directory

This is a problem.
Try to uncheck then re-check that partition in the Amahi Dashboard Storage Pool page.
That's what should create that directory.
If Greyhole --stats still gives an error, try to leave that partition unchecked, and see if the loop continues.

Also, try:
service greyhole stop
service greyhole start
tail -f /var/log/greyhole.log

Wait for Greyhole to give some kind of error, or restart, and paste the whole thing here (all the commands you typed, and the result up to the error).

Also, I guess the number of spooled tasks could be a problem (1,522,536)...
With Greyhole running in a loop, see if the numbers given by "greyhole --view-queue" change. Normally, the spooled number should decrease, and the numbers above that should increase.

1) removed /home from the pool - had added that at one stage to see if a cange to the pool would clear things up, but it did not.
2) tried restarting the service - no error after 10 min
3) over 10 min - no change in results of greyhole --view-queue

#10 Updated by JohnWhitmore about 10 years ago

gboudreau wrote:

Yes, that would be better, but still take a while to just ls.
I think find | head -1 would be faster when the directory is full, since that line just checks if there's no files in there.
But then, the code just below that does another check for that... So I'm not sure why we'd need two checks.

JohnWhitmore: Remove those 3 lines from /usr/bin/greyhole:

if (count(glob("/var/spool/greyhole/*")) === 0) {
break;
}

And restart the service. That should fix your problem.
Let us know.

Made the above change and things appear to be moving now.

Is this a case of too many files on the landing zone causing greyhole to get overloaded?

Once all the files are moved out to the various drives, should I insert the lines again? Or leave it as it is?

Thank you
John

#11 Updated by gboudreau about 10 years ago

  • Status changed from New to Resolved

This was caused by a lot of files in the spool directory.

Leave those lines out. Upcoming Greyhole versions will not have them. They were superfluous.

#12 Updated by JohnWhitmore about 10 years ago

gboudreau wrote:

This was caused by a lot of files in the spool directory.

Leave those lines out. Upcoming Greyhole versions will not have them. They were superfluous.

OK. Once again. Thank you for the help. And glad I could help squash another bug. ust got 3 more drives of data to move and the server will be done.

A 14tB beast lol

Thank you

#13 Updated by cpg about 10 years ago

  • Subject changed from Greyhole in a loop to Millions of files in the spool dir cause memory issues - was Re: Greyhole in a loop
  • Status changed from Resolved to Closed

Also available in: Atom