Forgotten Unix commands: xargs

In today’s post we will cover a bit about xargs - also very useful to perform rather complex (and repetitive) operations on the Linux shell.

xargs is particularly useful when it comes to piping (“|“) and chaining commands and their arguments together.

xargs reads items from the standard input or from pipes, delimited by blanks or newlines, and then executes the command one or more times with any initial arguments followed by items read from standard input. Blank lines on the standard input are ignored1.

Let’s do some examples:

# echo Where is my tux? | xargs
Where is my tux?

That was easy, especially since echo is the default command anyway.

Now something more useful maybe:

# find /var/log -name 'secure-*' -type f -print | xargs /bin/ls -las
4 -rw------- 1 root root  593 Dec 28 20:02 /var/log/secure-20131229
8 -rw------- 1 root root 6734 Jan  4 14:38 /var/log/secure-20140105
4 -rw------- 1 root root 3793 Jan 10 15:33 /var/log/secure-20140112
4 -rw------- 1 root root 1182 Jan 16 08:40 /var/log/secure-20140119

You do not really need the -print here, and you can do it with find alone as well:

# find /var/log -name 'secure-*' -type f -ls
11903950    4 -rw-------   1 root     root          593 Dec 28 20:02 /var/log/secure-20131229
11903981    8 -rw-------   1 root     root         6734 Jan  4 14:38 /var/log/secure-20140105
11903996    4 -rw-------   1 root     root         3793 Jan 10 15:33 /var/log/secure-20140112
11904057    4 -rw-------   1 root     root         1182 Jan 16 08:40 /var/log/secure-20140119

or:

# find /var/log -name 'secure-*' -type f -exec ls -las {} \;
4 -rw------- 1 root root 593 Dec 28 20:02 /var/log/secure-20131229
8 -rw------- 1 root root 6734 Jan  4 14:38 /var/log/secure-20140105
4 -rw------- 1 root root 3793 Jan 10 15:33 /var/log/secure-20140112
4 -rw------- 1 root root 1182 Jan 16 08:40 /var/log/secure-20140119

But maybe you forgot that syntax? It never hurts to have several approaches at hand!

Some more useful examples, the next one is to clean up things a bit. Say you have a couple of files in several directories:

# ls -las dir1; ls -las dir2
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:45 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 1
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 2
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:45 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 3
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 4

For some reason, you want them all to be moved to one single, new directory, let’s call it dir3:

# ls -las dir3
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:45 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..

You can now embark on a mv orgy, or you can do it a bit faster:

# find ./ -type f -print0 | xargs -0 -I {} mv {} dir3
# ls -las dir*
dir1:
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:46 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..

dir2:
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:46 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..

dir3:
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:46 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 1
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 2
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 3
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 4

Nice, isn’t it? {} acts as placeholder for the input files and argument list, the -0 and -Ioptions are to handle special characters in filenames (which is also why we used -print0in the find command) and to replace a specified string occurrence with one from the standard input.

Now, let’s cp all these files into a single directory (e.g. useful to make a quick “backup” of files to an external drive, etc.):

# find /tmp/xargs/ -type f -print0 | xargs -0 -r -I file cp -v -p file --target-directory=/tmp/xargs/dir3
`/tmp/xargs/dir2/3' -> `/tmp/xargs/dir3/3'
`/tmp/xargs/dir2/4' -> `/tmp/xargs/dir3/4'
`/tmp/xargs/dir1/1' -> `/tmp/xargs/dir3/1'
`/tmp/xargs/dir1/2' -> `/tmp/xargs/dir3/2'
# ls -las dir3
total 8
4 drwxr-xr-x 2 root root 4096 Jan 24 09:58 .
4 drwxr-xr-x 5 root root 4096 Jan 24 09:42 ..
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 1
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 2
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 3
0 -rw-r--r-- 1 root root    0 Jan 24 09:42 4

Looks good, all files copied.

Last, but not least, instead very nifty indeed, is quickly creating an archive of files (and/or directories):

# find /tmp/xargs -type f | xargs tar rvf archive.tar
tar: Removing leading `/' from member names
/tmp/xargs/dir2/3
/tmp/xargs/dir2/4
/tmp/xargs/dir1/1
/tmp/xargs/dir1/2
# ls -las
total 32
 4 drwxr-xr-x  5 root root  4096 Jan 24 10:01 .
 4 drwxrwxrwt. 4 root root  4096 Jan 24 09:42 ..
12 -rw-r--r--  1 root root 10240 Jan 24 10:01 archive.tar
 4 drwxr-xr-x  2 root root  4096 Jan 24 09:52 dir1
 4 drwxr-xr-x  2 root root  4096 Jan 24 09:52 dir2
 4 drwxr-xr-x  2 root root  4096 Jan 24 10:01 dir3
# tar -tf archive.tar
tmp/xargs/dir2/3
tmp/xargs/dir2/4
tmp/xargs/dir1/1
tmp/xargs/dir1/2

As we can see, all files are in the archive we created!

xargs isn’t something one can learn on the fly, especially the more complex operations that it can handle – but the latter fact makes it a very valuable tool for system administrators of both virtual private servers and dedicated servers.

Now it is time to wish you good luck and a lot of fun exploring the world of xargs!

1 from the man page of GNU’s version of xargs

 

Forgotten Unix Commands: awk

In our weekly series of forgotten UNIX commands, we will today give a brief overview overawkawk is extremely useful for manipulating structured files, and displaying and working with information contained in these, so it can come in very handy for any virtual private server admin.

Let’s get started!

Assume we have some sort of logfile of a webserver, with entries like the following:

119.63.193.131 - - [17/Jan/2014:07:01:10 +0000] "GET / HTTP/1.1" 302 211 "-" "Mozilla/4.0 (...)"
211.129.81.174 - - [17/Jan/2014:07:01:12 +0000] "GET /robots.txt HTTP/1.1" 200 40 "-" "siclab (...)"

awk works as you’d expect it from a shell prompt: it takes stdin as input, and writes to stdout by default. Now, we want to have a look at the IP addresses accessing our webserver:

root:> awk '{print $1}' logfile
119.63.193.131
211.129.81.174

Ok, that was easy, right? awk assigns each field per line, separated by default by whitespaces, to variables starting with $1, and going up to the number of fields in a line. $0 is the entire line, and NF is the “number of fields” count and can print the last field or any field going backwards from the last. To see how NF works, here is an example:

root:> awk '{print $(NF-12)}' logfile
119.63.193.131
211.129.81.174

Another useful variable is NR, which is the current row number, so let’s go a step further: display row numbers, IP addresses, and the status code, and let’s also format the output a bit:

root:> awk '{print NR " : " $1 " : " $9}' logfile
1 : 119.63.193.131 : 302
2 : 211.129.81.174 : 200

You could also add up fields, for example the total number of bytes transferred:

root:> awk '{ total += $10; print $10 " bytes in this line -> current total: " total}' logfile
211 bytes in this line -> current total: 211
40 bytes in this line -> current total: 251

You could also just display the output after processing the last line:

root:> awk '{ total += $10; print $10 " bytes in this line." } END { print "final total: " total }' logfile
211 bytes in this line.
40 bytes in this line.
final total: 251

There is more to it of course. On most linux systems, ps aux will display a nice processlist of the underlying system, including memory and cpu time used, etc. Column 6 contains the resident set size, a very useful indicator. Let’s sum it up quickly:

root:> ps aux | awk '{ rss += $6 } END { print "total rss: " rss }'
total rss: 132256

Faster than using a calculator, right?

A final one before I let you embark on your awk explorations on your own: assume you have a runaway/zombie/whatever httpd that you need to get rid of as fast as possible , and you want to just kill all processes that have httpd in their command column:

root:> ps aux | grep httpd | awk '{ print "kill -9 " $2}'
kill -9 740
kill -9 4629
kill -9 9365
kill -9 9366
kill -9 9368
kill -9 10589
kill -9 19518
kill -9 19689
kill -9 20126
kill -9 21925
kill -9 23486
kill -9 24635

NB: this just prints, but does not do anything. To make it happen, you need to pipe that output through the shell, i.e. use the same command line as above, but add “ | sh ” at the end.

Have fun exploring, and possibly use Wikipedia to get started: http://en.wikipedia.org/wiki/AWK

 

Needle in a haystack, or grep revisited: tre-agrep

Probably everyone who uses a terminal knows the command grep, cf. this excerpt from its man page:

grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.

So this is the best tool to search in a big file for a specific pattern, or a specific process in the complete list of running processes, but it has its limitations: it searches for the exact string that you search for, but sometimes it could be useful to do an “approximate” or “fuzzy” search instead.

For this goal the program agrep was firstly developed, from wikipedia we can gain some details about this software:

agrep (approximate grep) is a proprietary approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.

It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu’s bitap algorithm based on Levenshtein distances.

agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.

So it’s closed source, but luckily there is an open source source alternative: tre-agrep

Tre Library

TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.

The matching algorithm used in TRE uses linear worst-case time in the length of the text being searched, and quadratic worst-case time to the length of the used regular expression. In other words, the time complexity of the algorithm is O(M^2N), where M is the length of the regular expression and N is the length of the text. The used space is also quadratic to the length of the regex, but does not depend on the searched string. This quadratic behaviour occurs only in pathological cases which are probably very rare in practice.

Approximate matching

Approximate pattern matching allows matches to be approximate, that is, allows the matches to be close to the searched pattern under some measure of closeness. TRE uses the edit-distance measure (also known as the Levenshtein distance) where characters can be inserted, deleted, or substituted in the searched text in order to get an exact match. Each insertion, deletion, or substitution adds the distance, or cost, of the match. TRE can report the matches which have a cost lower than some given threshold value. TRE can also be used to search for matches with the lowest cost.

INSTALLATION

Tre-agrep it’s usually not installed by default by any distribution but it’s available in many repositories so you can easily install it with the package manager of your distribution, e.g. for Debian/Ubuntu and Mint you can use the command:

apt-get install tre-agrep

BASIC USAGE

The usage is best demonstrated with some simple example of this powerfulcommand, given the file example.txt that contains:

Résumé
RÉSUMÉ
resume
Resümee
rèsümê
Resume
linuxaria

Following is he output of the command tre-agrep with different options:

 mint-desktop tmp # tre-agrep resume example.txt
resume

mint-desktop tmp # tre-agrep -i resume example.txt
resume
Resume

mint-desktop tmp # tre-agrep -1 -i resume example.txt
resume
Resümee
Resume

mint-desktop tmp # tre-agrep -2 -i resume example.txt
Résumé
RÉSUMÉ
resume
Resümee
Resume

As you can see, without any option it returned the same result as a normal grep, the -i option is used to ignore case sensitivity, with the interesting options being -1 and -2: these are the distances allowed in the search, so the larger the number the more results you’ll get since you allow a greater “distance” from the original pattern.

To see the distance of each match you can use the option -s: it prints each match’s cost:

mint-desktop tmp # tre-agrep -5 -s -i resume example.txt
2:Résumé
2:RÉSUMÉ
0:resume
1:Resümee
3:rèsümê
0:Resume
5:linuxaria

So in this example the string Resume has a cost of 0, while linuxaria has a cost of 5.

Further interesting options are those that assign a cost for different operations:

-D NUM, –delete-cost=NUM – Set cost of missing characters to NUM.
-I NUM, –insert-cost=NUM – Set cost of extra characters to NUM.
-S NUM, –substitute-cost=NUM – Set cost of incorrect characters to NUM. Note that a deletion (a missing character) and an insertion (an extra character) together constitute a substituted character, but the cost will be the that of a deletion and an insertion added together.

CONCLUSIONS

The command tre-agrep is yet another small tool that can save your day if you work a lot with terminals and bash scripts.

 

This article was originally published on linuxaria. Castlegem has permission to republish. Thank you, linuxaria!