How to Efficiently Manage Text Files in Linux

How to Efficiently Manage Large Text Files in Linux

 

Linux is an operating system of text files. Unlike Windows, the Linux philosophy and core concept is that “everything is a file”. Sure, there are databases and binary structures, but nothing like Windows’ “Registry” exists. Even devices, partitions, and sockets are represented either by real or virtual files.

Given all of this, some text files can get pretty big. And often we’re not talking about dozens of MB, but possibly hundreds, or even a few gigabytes in size in rare occasions. And all of it can be text! In fact, something as innocuous as a log file can continue to grow if left unchecked. Let’s say you have a file recording every visit to your website, along with the date, IP, user-agent, etc. For even a medium-sized website, that file can grow pretty large if not dealt with.

In this article, we’ll look at how to view these large files without slowing down your system. Sometimes you might even need to edit these large files, which is far more difficult to deal with when the files are of this magnitude. Luckily, it’s quite unheard of for a configuration file to be large enough to make it problematic to edit! Let’s begin.

Listing the Largest Files on your Linux System

Let’s say you need to clean out some space on your hard drive and want a list of the biggest culprits eating up your disk space. To get a list of the top 10 largest files, use this command:

find / -type f -exec du -sh {} 2>/dev/null + | sort -rh | head -n 10

This will give you an output like this. Note that it may take a while for the results to appear onscreen:

Large Text Files
List of Large Text Files

As you can see, most of these are binary files or databases that you can’t open directly. But often you’ll find mail logs that keep a record of even spam attempts to access your server. And if your IP address is hit often, that mail spool will just keep getting bigger and bigger, reaching hundreds of MB in a matter of a few months.

Trying to Open Large Files the Traditional Way

For this example, we’re going to create an example text file and bloat its size using the “truncate” command like this:

truncate --size=20M testfile

Where “testfile” is the sample file. This command will set its size to 20MB and fill the extra space with null characters. We can get the size of all the files in the current directory in MB using this command:

ls -l --block-size=M

Here’s the output:

Creating 20MB Test File
Creating a 20MB Test File in the Directory

We will use the traditional text editor called “vi” to open the file and close it as soon as it’s loaded. To know how long the entire operation lasted, we use the “time” command, like this:

time vi testfile

And when we exit vi after “testfile” has displayed its contents, it gives us the duration of the entire operation like this:

Opening and Closing File Using vi
Opening and Closing the File Using vi

So in this case, it takes us almost 6 seconds for a file that’s just 20 MB! That’s an absurd amount of time for a powerful processor. What happens when we increase the size to 40 MB?

Execution Time Increases Non-Linearly
The Execution Time Increases Non-Linearly

It now takes us 23 seconds! So the time increases non-linearly. When the file is larger, it takes exponentially longer to open.

The “cat” command is even worse:

Cat Command Has Similar Performance
The Cat Command Offers Similar Performance

15 seconds just to open a 10MB file!

Using “Less” is More Efficient

Luckily for us, we have a command called “less” which reads the contents of the file one block at a time, and thus spares us the problem of loading it all at once. The syntax is simple:

less [filename]

Where [filename] is the name of the large file you want open. You can see in the screenshot below that it opens a huge file in less than a second:

Less Command Opens Files Instantly
The Less Command Opens Files Almost Instantaneously

Most of that time is used by having to manually quit the program. So it’s pretty much instantaneous! We can navigate to the next “block” in the file by pressing “f” and we can go back by pressing “b”. We can also use “j” and “k”, or the arrow keys, to navigate between lines.

So the next time you see a file of hundreds of MB and need to view it, use “less” instead of “vi” or “cat”. Your system resources will thank you for it, and you’ll save time as well!


Of course, you don’t have to do any of this if you use one of our Outsourced Server Support Services, in which case you can simply ask our expert Linux admins to manage your large text files for you. Just sit back, relax, and let our admins take care of the issue for you. They are available 24×7 to help you with your requests.

PS. If you liked this post on managing your text files in Linux, please share it with your friends on the social networks by using the share shortcut buttons, or simply leave a comment in the comments section below. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *