A very quick introduction to AWK

posted on 2018-05-25

This is a very, fast and small intro to AWK, just because it's good to know it exists.

What it can do is:

  • Anything grep does
  • Anything wc does
  • Manipulate text tables of most types
  • Work in a pipe
  • Probably a lot sed can, but just learn to use sed anyway.

Syntax

The input is split into records (lines by default) and then the code blocks are matched and executed if they match.

Every code block has a pattern in front of it and BEGIN and END are special patterns that match before the beginning of the file and after then end of the file. Another common pattern is the regular expression using slashes.

None of these blocks are mandatory, all are shown below in the example.

BEGIN {
  #code here, this is a comment
}
/[134]*/ {
  print
}
END {
}

Every record (line) that starts with a 1, 3 or 4 will be printed.

Records are collected from the input by splitting the input into chunks using the record separator stored in the RS variable.

When a record matches, it is split into fields using the field separator variable FS. So the input data is scanned as "field FS field FS field RS".

FS and RS can be set using the = operator in any of the blocks. They default to FS=" " and RS="\n".

Both have their counterparts for output variables used during print: OFS and ORS.

Once the record has been split up (using FS), it's available like Bash argument variables: $0 is the whole record, $1 the first field, $2 the second, etc ...

Normal variables have no type or distinct starting characters. Statements in the blocks are separated by a semicolon (;) and/or newline.

Most common functions

FunctionDescription
print RecordToPrint Print the argument or `$0` by default, or a given field using `$n`. For example `print $3` to print the third field.
next Jump to next record instead of trying for other matches.
system(commandToExecute) Execute commandToExecute.
rand() Generate random number, zero or more, lower then 1.
gsub(regularExpression, replacement, haystack) Substitude
length(value) Returns the length of value
tolower(value) Returns the uppercase version of value
toupper(value) Returns the lowercase version of value

Execution

Use one of the following (or look at the manual once):

awk -f program-file
awk -f program-file -- file names to process
awk -- "program-text" file names to process

Examples

This is a very simple user list to very simple html conversion in AWK: execute using awk -f thefile.awk -- /etc/passwd > temp.html

BEGIN{
  FS = ":"
  ORS = "<br />\n"
  print "<html><body>"
}
//{
  print "User <b>" $1 "</b>"
  print "lives at " $6
  usercount += 1
}
END{
  print "";
  print "A total of " usercount " users on the system."
  print "</body></html>"
}

Line for line, the following is stated above:

  • In the BEGINning...
    • Split fields using the colon character (":")
    • When outputting records, the separator should be <br />\n
    • Now print the start of the html page to the stdout
  • Close the block
  • For every line (all records will match an empty expression)
    • Print the first field of the list (the first text before FS)
    • Print lives at followed by the 6th field of the line.
    • Increment the user variable with one. It defaults to zero/empty.
  • Close the block
  • At the END of all input
    • Print an empty record (which puts "" + ORS on the output)
    • Print "A total of" then the OFS then the user count then OFS again and then " users on the system" ORS
    • Print the closing statements for the html page ORS
  • Close the block

Here is a quick line to get all /var/log/message type loglines form a raw diskimage and show how many where not matched:

strings diskimage.img | \
awk -- "/^[A-Z][a-z][a-z] [0-9 ][0-9 ] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]/{print; next}//{unmatched++}END{print unmatched " lines unmatched."}"

For more examples, read the manual (man awk).

See also

Intrigued by the possibilities already? Then see also: