Wednesday, 4 September 2013

Solaris, du and df commands show different results

There are 4 reasons why du and df can show different answers:
  • Inconsistent filesystem requiring fsck(1m).
  • Process with open file which does not exist in filesystem.
  • Mount point directory contains data.
  • du command is being run as non-root and there are directories which restrict read permissions
Main reason, why df and df show different answers, is
  • du "walks" the filesystem (like "find" command would) checking the size of each file in turn, and keeping track of the total.
  • df makes a system call to the filesystem itself and requests number of details, one of which is the current disk space used. (it gets the info directly from the superblocks of the filesystem).

Inconsistent fileystem requiring fsck(1m).

If the filesystem becomes corrupt/inconsistent for some reason, it is quite likely that du and df will differ. What can be seen by a process looking at the filesystem (i.e. du), does not match up with
the view the filesystem itself has (i.e. what will be returned to the querying df process). Corrupt/inconsistent filesystems should be repaired using fsck(1m).

Process with open file which does not exist in the filesystem directory structure.

This scenario commonly occurs when some process keeps writing to a file (usually a logfile) and a sysadmin deletes the file in panic to prevent the filesystem from filling up. But the offending process keeps running and the space is not freed (the process keeps the file open).
The disk blocks associated with a file are actually deleted and made available for reuse when the last "reference" to the file is removed. When a Unix process opens a file, the reference count to that file is incremented. Subsequently, if the file itself is removed from the filesystem, the data blocks remain in use until the process closes the file, either explicitly with close(2), or implicitly when the
process dies.
Under these conditions, du will be unable to "see" the file in the filesystem (it was rm'd from the dir. structure), and therefore will not count its size, but df (in getting the answer from the filesystem
itself) "knows" the file still exists.
When the process closes the file (explicitly, or implicitly when the process either quits or is killed, or the machine is rebooted), the disk blocks will return to the freelist and du and df will agree. Actually it is the unmount and remount of the filesystem that fixes this problem. But obviously if some process has an open file on the filesystem, it will be impossible to unmount the filesystem (device busy).
NOTE: Before removing the file, you can use the fuser(1m) command to identify processes that have a specified file open.
NOTE: Alternately, instead of deleting the log file, you can "cat /dev/null > /path/to/logfile", which empties the file without deleting it. But the best way is generally to stop the offending process, delete the file, then restart the process. 
NOTE: If you have already deleted the file and then discover the disk space has not been released, you can use one of the following procedures to attempt to locate the process that is holding on to the file:
 The /proc filesystem gives access to all open files of all processes. Assuming a role with sufficient permissions a user can find all open files with a link count of zero. eg
    # find /proc/*/fd -type f -links 0 \! -size 0 -ls
You can view the (e.g) log file with commands like:
    # tail -f /proc//fd/
If you find that there is a process that you can't kill that has a huge file, you can truncate it with:
    # cp /dev/null /proc//fd/
Be sure you have the right one.

Directory mount point containing data.

As filesystems are mounted on top of directories, if a directory mount point contains data, the du process will be unable to see this data (seeing only the mounted filesystem), but the underlying
filesystem will still keep track of this data, consequently df will report the extra disk space in use.
Unmounting the filesystem will reveal the data. However, if the mounted filesystem is being used by running processes it will not be possible to unmount it. Either identify and kill the processes (fuser(1m), etc.), or reboot (possibly in single user mode) to check the mount point directory.

du command is being run as non-root

du cannot report on any files in a directory to which the user doesn't have a read permission. If the du command is run as root, this problem can be eliminated. This can also be tested by running the du command with the -r option. This will generate messages about directories that can be read, files that cannot be opened and so forth, rather than being silent (the default).

No comments:

Post a Comment