Linux temporarily thinks disk is full

I’m working on a Linux server with CentOS 6.5 and a NAS NFS on a QDR Infiniband network. I am running a bash script that basically creates a directory, makes symlinks inside of it, and cats together one small file into each directory. It does this for a few hundred directories.

I noticed in the output log that one of the symlinks and the subsequent cat failed to run, claiming that the disk was full. It was quite clearly not. Running that same script for a few thousand directories, I began getting a very large number of these messages. I checked and the disk looked to be full, so I immediately killed my script, but then after a few minutes, the disk returned to normal.

Here are the sequential df commands that I saw, the first while the script was running, the second just after killing it, and the third some seconds later /home3(a NAS) is the one I’m working on:

[swfl 07:40:56 JPM]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/vg_misisss6-lv_root  135G   25G  104G  19% /
tmpfs                             12G     0   12G   0% /dev/shm
/dev/sda1                        485M   69M  392M  15% /boot
misisss-nasib3:/home              26T   26T  1.0M 100% /home3
misisss-nas1:/shared              77G  437M   73G   1% /shared
misisss-nasib2:/home              15T   15T   95G 100% /home2
You have new mail in /var/spool/mail/swfl

[swfl 07:41:39 JPM]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/vg_misisss6-lv_root  135G   25G  104G  19% /
tmpfs                             12G     0   12G   0% /dev/shm
/dev/sda1                        485M   69M  392M  15% /boot
misisss-nasib3:/home              26T   26T  1.0M 100% /home3
misisss-nas1:/shared              77G  437M   73G   1% /shared
misisss-nasib2:/home              15T   15T   94G 100% /home2

[swfl 07:41:58 JPM]$ df -h
Filesystem                       Size  Used Avail Use% Mounted on
/dev/mapper/vg_misisss6-lv_root  135G   25G  104G  19% /
tmpfs                             12G     0   12G   0% /dev/shm
/dev/sda1                        485M   69M  392M  15% /boot
misisss-nasib3:/home              26T   21T  4.2T  84% /home3
misisss-nas1:/shared              77G  437M   73G   1% /shared
misisss-nasib2:/home              15T   15T   93G 100% /home2

At the time, there was relatively little CPU usage on most cores and low to moderate disk usage. I don’t have monitoring software running, so I can’t give IOps figures or anything of the sort, but I’ve done work similar to this but at much higher intensity without issue.

In short, it’d be very difficult to believe I was overwhelming any part of the system with the work being done. Breadcrumbs as to where to search for issues?

UPDATE 1 Running watch 'df -h; df -i' to keep track of inodes and disk usage, I can see disk space dropping precipitously (things are OK for ~5 seconds, then several TB disappear over 10-20 seconds), until I start getting the errors, but in odes isn’t dropping nearly as much.

I can see in odes does have fairly high utilization (30-70%), though. I have ~16 billion inodes and am creating ~40000 files/directories. After I kill the process, the disk space will start climbing slowly (a few GB) for 10-20 seconds, then will jump back up a few TB to what it was originally.

Latest Images

Trending Articles

Latest Images