Sizes of Large Directories with Gluster, SSHFS, NFS

This post covers a few things: checking the number of entries or rough sizes of a directory, a look at the behavior of Gluster, NFS, SSHFS, and SFTP for remote directory sizes, and some info on mounting remote file systems using NFS or SSHFS (Gluster is a topic for another day).

First, how to check the rough size of a directory. We know that checking the number of files in a large directory locally can be slow. Doing that check over a remote mount like Gluster can be much slower and even cause the Gluster mount to crash.
Generally, under Linux/Unix, you can get a rough estimate of the size of a directory by looking at the output of ls -l in the parent directory of that directory. For example, let's use a test directory to check behavior: ~/test. We've added two subdirectories dir_small with no files in it and dir_large with 5000 files in it. ls -l ~/test gives:
ls -l ~/test
drwxrwxr-x. 2 jm jm 143360 Apr 26 16:41 dir_large
drwxrwxr-x. 2 jm jm   4096 Apr 26 16:56 dir_small

Here dir_small has the smallest size possible and dir_large is larger due to the 5000 files in it. Remember that in Linux/Unix, a directory is just a special type of file that keeps information about the list of files in that directory. 

In order to count the files or just a guide as to which is the largest directory, the safest options are:
ls -l ../ # only a rough guide, check parent directory to see directory 'size'
find dir_large -type f -print | wc -l # lists files one by one
echo * | awk -F ' ' '{print NF}' # provides the list of files in one line
ls | wc -l # performs more ops than echo * (as checked by strace)
If you think you have a large directory, don't do ls -l as that will stat each file requiring far more ops and time.
I point all of this out as this was tried using Gluster with some issues. In case you don't know, Gluster or GlusterFS is a clustered file system. We've used it for stand alone clusters within our data centers (DC). It can do cross DC replication although we've found too many problems so don't use that feature.

We had an issue the other day with a system mounting a Gluster volume and we wanted to check on files in a large collection of directories.

Interestingly, Gluster's directory sizes don't show the usual file size info. For the same directories as above, it showed
drwxrwxr-x. 2 jm jm   4096 Apr 26 16:41 dir_large
drwxrwxr-x. 2 jm jm   4096 Apr 26 16:56 dir_small

Running other commands like ls and find can be painful with a large Gluster volume. Some have given it a reputation for being almost unusable for certain interactive operations like ls, find, du, etc. Ultimately, the solution was to use find and it's one file at a time checking; the alternative is to run this commands on the Gluster server rather than the client system.

Since Gluster has a different meaning for directory size, we thought we'd see whether that is Gluster or the FUSE system that it uses. Interesting, the Gluster brick, the file system being exported for remote mounting, shows the normal directory size. That size would be expected as it's a normal Linux file system. We also tried NFS which passes through the usual directory size:
drwxrwxr-x. 2 jm jm 143360 Apr 26 16:41 dir_large
drwxrwxr-x. 2 jm jm   4096 Apr 26 16:56 dir_small

NFS doesn't use FUSE though so this only confirms that normal directory sizes are passed through with other remote file systems. 

We also tried SSHFS which uses FUSE for mounting. Just like NFS, SSHFS showed the right directory sizes:
drwxrwxr-x. 2 jm jm 143360 Apr 26 16:41 dir_large
drwxrwxr-x. 2 jm jm   4096 Apr 26 16:56 dir_small

By the way, SFTP also showed the normal directory sizes, but you might assume that by now.

So, this looks like Gluster is the issue here and has chosen to re-interpret the meaning of a directory size.

Information on mounting NFS and SSHFS file systems
Instructions for SSHFS:
SSHFS isn't installed on all systems. On my test system (Fedora 29), I needed to install with
sudo dnf install fuse-sshfs
# make a mount point
mkdir ~/test/mount_point
# to mount, do:
sshfs remote_user@remote_host:/remote_directory ~/test/mount_point
# to unmount, do:
 fusermount -u ~/test/mount_point

Instructions for NFS:
#edit the export file and add your export directory
vi /etc/exports
# add  /home/jm/test localhost(ro) *.local.domain(ro)
#start nfs-server
sudo systemctl start nfs-server
sudo exportfs -a
# check the exported file system is available from the local or client system:
showmount -e test-nfs-server.local.domain
showmount -e localhost
# make a mount point
mkdir /tmp/mount_point
sudo mount -t nfs test-nfs-server.local.domain:/home/jm/test /tmp/mount_point
 #to unmount, use the usual umount:
sudo umount /tmp/mount_point
# on the nfs server, un-export the fs:
sudo exportfs -ua
#stop nfs if you want:
systemctl stop nfs-server


Popular Posts