Forgotten Unix Commands: awk

In our weekly series of forgotten UNIX commands, we will today give a brief overview overawkawk is extremely useful for manipulating structured files, and displaying and working with information contained in these, so it can come in very handy for any virtual private server admin.

Let’s get started!

Assume we have some sort of logfile of a webserver, with entries like the following:

119.63.193.131 - - [17/Jan/2014:07:01:10 +0000] "GET / HTTP/1.1" 302 211 "-" "Mozilla/4.0 (...)"
211.129.81.174 - - [17/Jan/2014:07:01:12 +0000] "GET /robots.txt HTTP/1.1" 200 40 "-" "siclab (...)"

awk works as you’d expect it from a shell prompt: it takes stdin as input, and writes to stdout by default. Now, we want to have a look at the IP addresses accessing our webserver:

root:> awk '{print $1}' logfile
119.63.193.131
211.129.81.174

Ok, that was easy, right? awk assigns each field per line, separated by default by whitespaces, to variables starting with $1, and going up to the number of fields in a line. $0 is the entire line, and NF is the “number of fields” count and can print the last field or any field going backwards from the last. To see how NF works, here is an example:

root:> awk '{print $(NF-12)}' logfile
119.63.193.131
211.129.81.174

Another useful variable is NR, which is the current row number, so let’s go a step further: display row numbers, IP addresses, and the status code, and let’s also format the output a bit:

root:> awk '{print NR " : " $1 " : " $9}' logfile
1 : 119.63.193.131 : 302
2 : 211.129.81.174 : 200

You could also add up fields, for example the total number of bytes transferred:

root:> awk '{ total += $10; print $10 " bytes in this line -> current total: " total}' logfile
211 bytes in this line -> current total: 211
40 bytes in this line -> current total: 251

You could also just display the output after processing the last line:

root:> awk '{ total += $10; print $10 " bytes in this line." } END { print "final total: " total }' logfile
211 bytes in this line.
40 bytes in this line.
final total: 251

There is more to it of course. On most linux systems, ps aux will display a nice processlist of the underlying system, including memory and cpu time used, etc. Column 6 contains the resident set size, a very useful indicator. Let’s sum it up quickly:

root:> ps aux | awk '{ rss += $6 } END { print "total rss: " rss }'
total rss: 132256

Faster than using a calculator, right?

A final one before I let you embark on your awk explorations on your own: assume you have a runaway/zombie/whatever httpd that you need to get rid of as fast as possible , and you want to just kill all processes that have httpd in their command column:

root:> ps aux | grep httpd | awk '{ print "kill -9 " $2}'
kill -9 740
kill -9 4629
kill -9 9365
kill -9 9366
kill -9 9368
kill -9 10589
kill -9 19518
kill -9 19689
kill -9 20126
kill -9 21925
kill -9 23486
kill -9 24635

NB: this just prints, but does not do anything. To make it happen, you need to pipe that output through the shell, i.e. use the same command line as above, but add “ | sh ” at the end.

Have fun exploring, and possibly use Wikipedia to get started: http://en.wikipedia.org/wiki/AWK