awk, Linux utilities

You must known the Linux utilities out there, they are really valuable !

I already know awk, but never actually use it until recently. I used sed, but not that much.

I didn’t see the real value of awk until I came to the following use case : a Word document to transform into a csv format ! I said that it was impossible : you have to to it by hand because it is unstructured !

Use the awk, Luke !

2 steps : save as plain text (note that it should be possible to do things with rtf too) Then write a simple awk script, like this one :

#!/usr/bin/awk -f
    print  cat, ",", $6, $7, ","      # categorie, 6th and 7th lines concatenated, 

    # Skip lines to the real content
    while ( $x != "Confirmation" ) {

    # Print the content between Confirmation and Observations 
    # if it doesn't begin with a number
    while ( $x != "Observations:" ) {
        if ( $x !~ /[0-9]+./ )  {
            print $x, " "
    print "\\n"

Simple ? Yes. The important part is in the BEGIN block : awk cuts things in rows and columns. By default, it uses new lines to separate records and “,” to separate fields. BUT, you can change this : I tell him that the Field Separator is the new line “\n” ; the Record Separator is the word “Definition” and that the Output Record Separator is just a space. It is why I have to put a new line at the end.

My example may not be very explicit, but imagine that you can extract all the paragraph after the title “Introduction” or specifics parts of a document. It works great if your Word doc contains tables.

Here is a good references : IBM Common threads: Awk by example by Daniel Robbins. Check the part2, it is really usefull.

