The Linux operating system is well known for its powerful command line functionality. As a system administrator, I operate a few hundred Linux servers and most of them don’t even utilize a graphical user interface such as KDE or Gnome. Maintenance of our servers requires extensive knowledge of various helpful (yet complex) commands to keep them healthy.

Disk space is frequently a headache with systems that store and write a lot of data (some useful, and some not). While QDirStat, GdMap, xdiskusage and Gnome’s Disk Usage Analyzer are all useful examples of graphic-based utilities to measure disk space usage, I have a toolkit of my own commands which I use from the command line and it’s safe to say they’re indispensible to me. Here are a few of my favorite examples.

Please note these examples are run on a Red Hat Linux system (but should suffice if you are on a different distribution) and you should have sudo access to run these as root for maximum efficiency.

Checking free space

1.df

This is the most basic command of all; df can display free disk space. Here’s what it will return when run:

[root@smatteso-vm1 ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/vda2 75987032 25075944 47044888 35% /

tmpfs 8162268 92 8162176 1% /dev/shm

/dev/vda1 245679 69859 162713 31% /boot

devnfs:/tools 611916000 162727328 418098496 29% /tools

This shows you the raw statistics regarding disk usage in bytes, but to make it more “human readable” run df with the -h switch:

2.df -h

[root@smatteso-vm1 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/vda2 73G 24G 45G 35% /

tmpfs 7.8G 92K 7.8G 1% /dev/shm

/dev/vda1 240M 69M 159M 31% /boot

devnfs:/tools 584G 156G 399G 28% /tools

That looks a little easier on the eyes. My root volume is 73 Gb with 45 Gb of free space.

3.df -Th

When figuring out disk space issues it can be important to know what type of file system you’re dealing with to plan accordingly. For instance, the ext4 file system in Linux has a maximum file system size of 16 Tb and can support files up to 2 Tb.

The “df -Th” command will display the same output as in the previous command but also includes file system types:

[root@smatteso-vm1 usr]# df -T -h

Filesystem Type Size Used Avail Use% Mounted on

/dev/vda2 ext4 73G 24G 45G 35% /

tmpfs tmpfs 7.8G 92K 7.8G 1% /dev/shm

/dev/vda1 ext4 240M 69M 159M 31% /boot

devnfs:/tools nfs 584G 156G 399G 29% /tools

Finding data sizes

Now that I’ve established how we can determine free space, how can we actually figure out what’s using space?

4.du -sh *

The du command is short for “disk usage.” In its most basic usage, if you simply run “du” at the root volume, it will show you the size of all the directories on your system. However, this may produce a glut of information difficult to parse.

Adding the switches -sh (“s” means “summarize” and again, “h” means ” human readable format”) to run du – sh * at the root volume will produce a cleaner output.

[root@smatteso-vm1 /]# du -sh *

0 BEVAP

12M bin

67M boot

0 CTF

16M db2dump

304K dev

65M etc

0 FEVAP

0 HHFE

2.1G home

0 LDW

329M lib

30M lib64

16K lost+found

4.0K media

0 misc

4.0K mnt

0 net

2.8G opt

0 proc

0 PUBAUTH

1.3M root

20K RTVIEW

16M sbin

4.0K SCURVY

4.0K selinux

4.0K srv

0 sys

4.3M tmp

7.1G tools

15G usr

3.9G var

You can easily see which directories contain the largest amount of data; the usr directory holds 15 Gb, tools has 7.1 Gb and var has 3.9 Gb. Let’s focus on the /var directory.

I COULD go into /var and rerun du – sh * to take a look at the size of the directories underneath, but since there are lots of subdirectories of varying sizes this may not be helpful since I’d have to rerun the same command numerous times. Instead, I can run this:

5.du -a /var | sort -nr | head -n 10

Let’s parse this command string.

du is the disk usage command, of course.

-a is the “all” switch to show all items.

/var tells du to scan the /var directory.

The pipe (“|”) character pipes the results to the sort command.

sort -nr will display the largest directories at the top of the list.

The pipe (“|”) character pipes the results to the head command.

head -n 10 will limit the number of results to the top ten largest directories.

When I run this on my system I get the following output:

[root@smatteso-vm1 var]# du -a /var | sort -nr | head -n 10

4042044 /var

3473128 /var/cache

3470188 /var/cache/yum

572308 /var/cache/yum/x86_64

572304 /var/cache/yum/x86_64/6Server

375388 /var/log

373056 /var/cache/yum/rhel6-auto-20170111

370676 /var/cache/yum/rhel6-auto-20161228

370496 /var/cache/yum/rhel6-auto-20161214

370308 /var/cache/yum/rhel6-auto-20161130

This shows me that the largest items are the /var directory itself (obviously) followed by /var/cache and /var/cache/yum.

Those four directories at the bottom of the list represent patch sets rolled out every couple of weeks. It’s safe for me to delete the prior directories, so the command rm -rf /var/cache/yum/rhel6-auto-2016* could free up over 1 Gb in space from those old patch set directories.

Note: to run this against the entire file system, just change:

du -a /var | sort -nr | head -n 10
to
du -a / | sort -nr | head -n 10

It’s possible to get even more technical using what’s known as a regex to filter the results more explicitly. Let’s take a closer look at the entire file system.

6.du -xh / |grep ‘^\S*[0-9\.]\+G’|sort -rn

Looks very complicated but it’s easy to break down:

du is the disk usage command

-x tells du to only check this file system (makes the command run faster) and the h after it is to present results in human readable format.

The pipe (“|”) character pipes the results to the grep command.

grep searches the results using the ‘^\s*[0-9\.]\+G’ regex – [0-9\.] will list all directories and subdirectories beginning with (^\S) non-whitespace characters (such as a tab, carriage return, etc.) – it must show alpha characters only (no numeric characters) and +G will display directories 1 Gb in size or larger.

The pipe (“|”) character pipes the results to the sort command.

sort -rn will display the largest directories at the top of the list.

When I run this on my system I get the following output:

[root@smatteso-vm1 var]# du -xh / |grep ‘^\S*[0-9\.]\+G’|sort -rn

24G /

15G /usr

12G /usr/local

5.7G /usr/local/litle-db/phxinst1

5.7G /usr/local/litle-db

3.9G /var

3.4G /var/cache/yum

3.4G /var/cache

3.2G /usr/local/litle-build

2.8G /opt

2.7G /usr/local/litle-tools

2.1G /home/smatteso/bin

2.1G /home/smatteso

2.1G /home

1.7G /usr/share

1.6G /usr/local/litle-db/phxinst1/FEVAP

1.6G /usr/local/litle-db/phxinst1/BEVAP

1.5G /usr/local/litle-build/LETS

1.1G /opt/ibm/db2/V9.7

1.1G /opt/ibm/db2

1.1G /opt/ibm

This lets me know I might free up some space under my home directory, since I’m using 2.1 Gb.

Cleaning up directories is one way to reclaim space, but it can be tedious navigating about looking for any big files you might come across. The “find” command can work wonders in pinpointing exactly which files are taking up space.

7.find / -printf ‘%s %p\n’| sort -nr | head -10

The above is one of my favorite commands since it will show me the ten biggest files on my system. Like before, let’s break it down.

find is the find command to search for files, obviously.

/ is the root volume

-printf ‘%s %p\n’ will nicely format the results by showing file size in bytes (%s) and the file name (%p).

The pipe (“|”) character pipes the results to the sort command.

sort -nr will display the largest directories at the top of the list

The pipe (“|”) character pipes the results to the head command.

head -n 10 will limit the number of results to the top ten largest directories.

Running this on my system will yield the following:

[root@smatteso-vm1 var]# find / -printf ‘%s %p\n’| sort -nr | head -10

582359040 /tools/infra/DWI_Bundles/3.0/LETS/LETS.tar

582359040 /tools/infra/DWI_Bundles/2.0/LETS/LETS.tar

582348800 /usr/local/litle-build/LETS/LETS-1.3.2-RC1.tar

572129280 /tools/infra/DWI_Bundles/4.0/eclipse/eclipse.tar

572129280 /tools/infra/DWI_Bundles/3.0/eclipse/eclipse.tar

571985920 /tools/infra/DWI_Bundles/1.0/LETS/LETS.tar

491255472 /usr/local/litle-build/LETS/eclipse-lets-mars-2-linux.gtk.x86_64.tar.gz

489757504 /usr/local/litle-build/LETS/eclipse-lets-mars-2-linux.gtk.x86_64.rc1.tar.gz

407285760 /tools/infra/DWI_Bundles/4.0/accurev/accurev.tar

407285760 /tools/infra/DWI_Bundles/3.0/accurev/accurev.tar

I don’t need anything in the old “DWI_Bundles” directory, so these are juicy targets for deletion.

My last example gets even more granular:

8.find / -xdev -type f -size +100M -exec ls -la {} \; | sort -nk 5

find is the find command to search for files, obviously.

/ is the root volume.

-xdev – only search this file system.

-type f – look for files.

-size +100M – only show files over 100 Mb in size.

-exec ls -la {} \; – runs the ls command to show all files with the long listing format.

The pipe (“|”) character pipes the results to the sort command.

sort -nk 5 will display the largest files at the bottom of the list.

When I run this on my system it produces the following results:

[root@smatteso-vm1 var]# find / -xdev -type f -size +100M -exec ls -la {} \; | sort -nk 5

-rwxr-x— 1 dev dev 113187613 Feb 27 2015 /usr/local/litle-tools/forgerock/OpenAM-12.0.0.war

-rwxr-x— 1 dev dev 113231850 Aug 12 2015 /usr/local/litle-tools/forgerock/OpenAM-12.0.1.war

-rw——- 1 skybot skybot 131764520 Oct 27 15:11 /opt/skybot/server/webapps/skybot-scheduler/download/setupSkybotSchedulerAgent.exe

-rw-r–r–. 1 root root 142451712 Oct 26 21:26 /var/cache/yum/x86_64/6Server/mrepo-frozen/d0af4ba21f53f3b5446243c02ee431ca-primary.sqlite

-rw-r–r– 1 root root 159047680 Jan 24 18:03 /var/cache/yum/mrepo-frozen/674a5eb91f391a4beeeae99c41673b01-primary.sqlite

-rw-r–r–. 1 root root 159047680 Oct 26 21:18 /var/cache/yum/DWI-Frozen/674a5eb91f391a4beeeae99c41673b01-primary.sqlite

-rw-r–r– 1 root root 182671360 Oct 27 17:59 /var/cache/yum/x86_64/6Server/mrepo-frozen/b41a2f136ca45aa8150ec03f6f54dd0f-filelists.sqlite

-rw-r–r–. 1 root root 195055616 Oct 26 21:19 /var/cache/yum/DWI-Frozen/0757540d1eef7635c93e9295f1e31694-filelists.sqlite

-rw-r–r– 1 root root 332309504 Nov 20 17:44 /var/cache/yum/rhel6-p202.1-20161027/primary.xml.gz.sqlite

-rw-r–r– 1 root root 336006144 Dec 4 16:52 /var/cache/yum/rhel6-auto-20161116/primary.xml.gz.sqlite

-rw-r–r– 1 root root 339330048 Dec 17 19:46 /var/cache/yum/rhel6-auto-20161130/primary.xml.gz.sqlite

-rw-r–r– 1 root root 339509248 Jan 1 18:05 /var/cache/yum/rhel6-auto-20161214/primary.xml.gz.sqlite

-rw-r–r– 1 root root 339685376 Jan 14 21:24 /var/cache/yum/rhel6-auto-20161228/primary.xml.gz.sqlite

-rw-r–r– 1 root root 339730432 Jan 24 18:04 /var/cache/yum/rhel6-auto-20170111/primary.xml.gz.sqlite

-rw-r–r– 1 dev dev 489757504 Nov 8 14:40 /usr/local/litle-build/LETS/eclipse-lets-mars-2-linux.gtk.x86_64.rc1.tar.gz

-rw-r–r– 1 dev dev 491255472 Nov 16 14:43 /usr/local/litle-build/LETS/eclipse-lets-mars-2-linux.gtk.x86_64.tar.gz

-rw-rw-r– 1 dev dev 582348800 Nov 22 11:24 /usr/local/litle-build/LETS/LETS-1.3.2-RC1.tar

Hey, what’s that setupSkybotSchedulerAgent.exe Windows executable doing on my Linux file system? That’s 1.3 Gb I can remove right there!

Deciding what to clean up

Finding what’s taking up space on your file system is the easy part. Figuring out what you can get rid of is trickier. I found some simple examples above such as deleting old patch set directories and a Windows executable. However, blindly deleting files is obviously an extremely risky and dangerous procedure. If you found large database files and removed them you could produce catastrophic consequences.

All file systems and their contents will vary, so you may find .log (log files which are relatively harmless to remove), .rpm files (Red Hat package files), .tar or .tar.gz files (file archives), .iso files (disc images) documents, multimedia files, and more. Check file ownerships and consult with the individuals responsible for them before deleting anything – and ensure you have a valid backup in place in case you need to undo your efforts!

Conclusion

These are only a few of the many powerful commands (and accompanying command switches) you can run in Linux. I advise you to experiment, read the man pages for df, du and find, and explore similar or even more robust options.

Also see:
How to get all the information you need about your Linux machine with a single command
10 mistakes to avoid when troubleshooting IT problems
5 data center-ready Linux distributions
Power checklist: Managing and troubleshooting Linux user accounts (Tech Pro Research)

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays