Linux

Cat Command in Linux: Essential and Advanced Examples

[ad_1]

Continuing the tour of those not-that-well known commands started last week with the ls command, let’s examine today the cat command.

The cat name stands for catenate as the primary job of that command is to join several input files by sequentially sending their content on standard output:

If you want to store the result of that concatenation in a file, you have to use a shell redirection:

Even if its primary design goal is to catenate files, the cat utility is also often employed with only one argument to display the content of that file onto the screen, exactly like I did it on the last line of the example above.

A. Using cat command with standard input

When used without any argument, the cat command will read data from its standard input and write them to its standard output— which is mostly useless … unless you are using some option to transform the data. We will talk about a couple of interesting options later.

In addition to file paths, the cat command also understands the - special file name as an alias for the standard input. That way, you can insert the data read from the standard input between the files given on the command line:

B. Using cat command with binary files

1. Joining split files

The cat command does not make any assumption about the file content, so it will happily work with binary data. Something that may be useful to rejoin files broken by the split or csplit command. Or to join partial downloads like we will do it now:

We have now two halves of an image. You can open the first half and see it is “broken” using ImageMagick’s display, or gimp, or any other software able to read image files:

If you study the curl command I used, you see the two parts are perfectly complementary. The first half is from byte 0 to 50000 and the second half, from the byte 50001 to the end of the file. There should be no missing data between them. So we only have to catenate the two parts together (in the right order) to get back the full file:

2. Working with streamable file formats

Not only you can use the cat command to “rejoin” binary files that were split into several parts, but in some cases, you can also create new files that way. That works particularly well with “headerless” or “streamable” file formats like MPEG transport stream video files (.TS files):

We can now combine all those transport stream video files using the cat command, obtaining a perfectly valid TS file in the output:

Thanks to the TS file format, you can combine those files in the order you want, and you can even use the same file several times in the argument list to create loops or repetition in the output video. Obviously, this would be more fun if we were using animated images, but will I let you do that by yourself: many consumer-grade devices record TS files, and if they don’t, you can still use ffmpeg to convert almost any video file to a transport stream file. Don’t hesitate to share your creations using the comment section!

3. Hacking cpio archives

As the last example, let see how we can use the cat command to combine several cpio archives. But this time, it will not be as straightforward since it will require a little bit of knowledge about the cpio archive file format.

A cpio archive stores file’s metadata and content sequentially which makes it suitable for file-level concatenation with the cat utility. Unfortunately, the cpio archive also contains a trailer used to mark the end of the archive:

The good new is, with cpio binary archives, that trailer has a fixed length of 280 bytes. So, using the head standard command, we have an easy way to remove it:

C. Essential cat command options

After having played with various binary file formats, let’s go back to plain old text files now by studying a couple of option specifically tailored to deal with those files. While not part of the POSIX standard, those options are portable across the BSD and GNU cat implementations. Please note I do not pretend to be exhaustive here, so check the man to see the full list of options supported by cat on your system!

-n: number lines

With the n option, the cat command will prefix each output line by its line number:

The -n option numbers output lines. That means the counter is not reset when switching from one input file to the next, as you will see it if you try the following command by yourself:

-s: suppress repeated empty output lines

With the -s option, the cat command will collapse multiple consecutive empty lines in only one:

In the example above, you can see, in the default output, the lines 10 and 11 were empty. When adding the -s option, the second empty line was discarded.

-b: number only non-empty lines

Somewhat related with the two preceding options, the -b option will number lines, but ignoring empty ones:

The example above uses two instances of the cat command with different options in a pipeline. The inner numbering comes from the -b option used with the first cat command. The outer numbering comes from the -n option used with the second cat.

As you can see, the first and last line were not numbered by the -b option because they are empty. But what about the 6th line? Why does it still be numbered with the -b option? Well, because it is a blank line— but not an empty one, as we will see in the next section.

-v, -e, -t: display non-printing characters

The three options -v, -e , and -t are used to display different sets of invisible characters. Even if the sets overlap, there is no “catch-all” option, so you will have to combine them if you want to display all invisible characters.

-v: view invisible characters

The -v option display all non-printing characters with caret and meta notation, except the line feed and tabulation.

With that option, control characters will appear as a caret (^) followed by the appropriate ASCII character (e.g., the carriage return, byte 13, is displayed as ^M because M in ASCII is 64 + 13), and characters with the high-order bit set will appear in “meta” notation M- followed by the representation corresponding to the 7 lower bits (e.g., the byte 141 will be displayed as M-^M because 141 is 128 + 13).

While seemingly esoteric, that feature may be useful when working with binary files, like, for example, if you want to examine the raw information embedded in a JPEG file:

Another use case for the -v option is finding control characters that might have leaked into a text file. If you remember it, we have that strange issue above with the -b option numbering the 6th input line, whereas it looked like it was empty. So let’s investigate that:

Ah ah! Do you see those ^M marks? They are used to replace the otherwise invisible carriage return character. Where did it come from? Well, the dict protocol, like any other Internet protocol, is using CRLF as the line terminator. So we downloaded them as part of our sample files. You can learn more about line feed and carriage returns in the fold and fmt article. But for now, it explains why cat considered the 6th line as not being empty.

-e: display invisible characters, including end-of-line

The -e option works like the -v option, except it will also add a dollar sign ($) before each line-feed character, thus explicitly showing the end of lines:

-t: display invisible characters, including tabs

The -t option works like the -v option, except it will also display tabulations using the ^I caret notation (tab is stored as a byte holding the value 9, and I in ASCII is 64+9=73):

-et: show all hidden characters

As I already mentioned it briefly, if you want to display all non-printing characters, including both tabulations and end-of-line markers, you will need to use both the -e and -t options:

Bonus: The useless use of cat command in Linux

No article about the cat command would be complete without a mention of the “Useless Use of Cat” anti-pattern.

It occurs when you use cat for the only purpose of sending the content of a file to the standard input of another command. That use of the cat command is said “useless” since a simple redirection or filename parameter would have done the job, and would have done it better. But an example worth a thousand words:

In the example above, I used a pipeline to display the content of the uuoc.txt file with the less pager:

So, the only purpose of the cat command was to feed the standard input of the less command with the content of the uuoc.txt file. I would have obtained the same behavior using a shell redirection:

As a matter of fact, the less command, like many commands, also accept a filename as an argument. So I could have simply written that instead:

As you can see, no need for cat here. If I mention the “Useless Use of Cat” anti-pattern, this is because, if you use it publicly on a forum or elsewhere, undoubtedly someone will point that out at you with the argument you will create an “extra process for nothing.”

I must admit for a long time I was quite dismissive with such comments. After all, on our modern hardware, spawning one extra process for a one-shot operation couldn’t cause that much overhead.

But while writing this article, I did a quick experiment, comparing the time required with and without UUOC by a test awk script to process 500MB of data coming from a slow media.

To my surprise, the difference was far from negligible:

Using the cat command to pipe data into awk will lead to significant performance penalties

However, the reason is not about an extra process being created. But because of the extra read/write and context switching the UUOC incurs (as you can infer it from the time spend executing system code). So indeed, when you work on large datasets that extra cat command has a nonnegligible cost. As for myself, I will try to be more vigilant with that now! And you? If you have examples of the Useless Use of Cat don’t hesitate to share them with us!

[ad_2]

Facebook Comments

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button