This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

CLI Magic: Learn to talk awk

By Keith Winston on January 16, 2006 (8:00:00 AM)

Share    Print    Comments   

User level: Advanced

When it comes to slicing and dicing text, few tools are as powerful, or as underutilized, as awk. The name "awk" was coined from the initials of its authors, Aho, Weinberger, and Kernighan -- yes, the same Kernighan of the famous Kernighan and Ritchie "C Programming Language" book. In the Linux world, every distribution includes the GNU version, gawk (/bin/awk is usually a symbolic link to /bin/gawk). The GNU version has a few more features than the original. Let's play with some of the core features common among POSIX-compliant awks.

In this article, when I reference awk, I am really using gawk.

The awk utility is a small program that executes awk language scripts, which are often one-liners, but just as easily may be larger programs saved in a text file. For example, to execute an awk script saved in the file prg1.awk and have it process the file data1, you could use a command such as:

awk -f prg1.awk data1

The result is written to standard out, or it may be piped to a result file.

The parameter -F changes the default field separator of a blank space. The field separator can also be changed within an awk program. To tell awk how to split data into fields from a comma-separated value (CSV) file, you would use:

awk -F"," -f prg1.awk data1

You may also include more than one data file to process, and awk will keep running until it runs out of data:

awk -F"," -f prg1.awk data1 data2 data3 data4 data5

If you want to assign a value to a variable before execution of the program, use the -v option:

awk -v AMOUNT=100 prg1.awk data1

Behold the power

The power of awk comes from how much it does automatically for you when crunching text files, and from the simple elegance of the language. When you feed awk a text file, it does the following:

  • Opens and reads all input files listed on the command line
  • Handles memory management for all variables
  • Parses each line and splits it into fields using the field separator
  • Presents each line of text to your program as variable $0
  • Presents each field from each line in predefined variables, starting with $1, $2, ... $N
  • Maintains many internal variables for your use, such as (but not limited to):
    • RS = record separator
    • FS = field separator
    • NF = number of fields in the current record
    • NR = number of records processed so far
  • Automatically handles conversion between internal data types (string, floating point, array)
  • Executes the BEGIN block before processing any records (a good place to initialize variables)
  • Executes the END block after processing all records (a good place to calculate report totals)
  • Closes all input files listed on the command line

The awk language uses only three internal data types: strings, floating point numbers, and arrays. Variables do not have to be defined before they are used. Awk handles converting data from one type to another as necessary. If you add two strings together using the addition operator (+) and they contain numeric values, you get a numeric result. If a string is used in an arithmetic operation but can't be converted to a number, it is converted to zero. Usually, awk does what you want when handling data conversion.

Awk can open, read, and write to more files than those listed on the command line by using the getline function or redirecting output from within a program. It has access to a set of internal functions that include math, string manipulation, formatted printing (similar to the C language printf), and miscellaneous functions like pseudo-random numbers. You can also create your own functions or function libraries that can be used in several programs. All of this is packed into an executable usually about 500k in size.

Programmers can typically become proficient in awk within a day. Complete references are available in a single book. You don't need a "bookshelf" of dead trees and CDs to master awk.

Implementations or ports of awk are available on nearly every platform, making your scripts reasonably portable.

Awk in the real world

Here is a short example of a recent awk application I created to import a list of email addresses and names from Novell Groupwise to PHPList, a mailing list manager. The list was exported from Groupwise in vCard File format (VCF), a text based format. Here is an example entry from the VCF file:

BEGIN:VCARD
VERSION:2.1
X-GWTYPE:USER
FN:Bar, Foo
ORG:;GREEN
EMAIL;WORK;PREF:foobar@yahoo.com
N:Bar;Foo
X-GWUSERID:foobar
X-GWADDRFMT:0
X-GWIDOMAIN:yahoo.com
X-GWTARGET:TO
END:VCARD

The target format was a CSV file that PHPList could import into an existing mailing list. I needed to extract the name from the record that starts with "FN" and the email address from the record that starts with "EMAIL."

I started construction of the script by setting up a custom record separator and a block of code to handle each record type. I saved the script in a text file called extract-emails.awk. Note that the .awk file extension is just convention; the file containing awk commands can be named anything. This was the beginning of the script:

BEGIN { FS = ":" }

/^FN/ {
# handle name here
}

/^EMAIL/ {
# handle email address here
}

The BEGIN block is run once before any records are read. It sets the field separator to a colon so awk will split the fields of the file when it encounters a colon.

The regular expressions /^FN/ and /^EMAIL/ tell awk to look for the characters "FN" or "EMAIL" at the start of a record, and if a match is found, run the associated block of code between the curly braces. This kind of regular expression match is common in awk but not required. A block of code with no match expression is run for every record processed by awk. I added a couple of comments (lines starting with "#") to document what each part of the script does.

Looking at the VCF data, I noticed that the "FN" record always precedes the "EMAIL" record, so I ordered the code blocks to process the records that way. Awk reads and executes a script in the order it appears. Many times, the order of the code will not matter, but in this case it does. The name is related to the email and I need to retain that relationship as the file is read, so I saved the name in an internal variable, then wrote both the email address and name to standard out while processing the email record.

Getting back to the task, let's complete the name section. The goal is to reformat the name from "lastname, firstname" into "firstname lastname," removing the comma. Here was my code:

/^FN/ {
# handle name here
fullname = tolower($2)
split(fullname, names, ",")
name = names[2] names[1]
}

Knowing that awk has split up the incoming records into fields using a colon as the field separator, the field variables for the example "FN" record contain the following:

$1 = "FN"
$2 = "Bar, Foo"

Working with the $2 variable, I used a built-in awk function, towlower(), to convert the names to lowercase and stored the result in a variable called "fullname." Next, I used the split function to break the name into first and last name parts, with the result stored in an array called "names." Finally, I glued the name back together in the desired order, without the comma, and stored that result in a variable called "name."

There is very little to do inside the email code block. Awk provides the email address to us in the $2 variable (note that $2 in the "EMAIL" record is different than $2 in the "FN" record). For consistency, I converted it to lowercase, then used the print function to write both the email address and name to standard out, with a comma separating the values. Here is the complete script:

BEGIN { FS = ":" }

/^FN/ {
# handle name here
fullname = tolower($2)
split(fullname, names, ",")
name = names[2] names[1]
}

/^EMAIL/ {
# handle email address here
mail = tolower($2)
print mail "," name
}

A sprinkle of shell glue

To pull it all together, we need a little shell glue. A small shell script allows us to call awk with the parameters we want and to easily redirect the output to a file. It is also handy to run a shell script when you are testing.

#!/bin/sh
# Extract e-mail addresses from VCF file for PHPList.
awk -f extract-emails.awk groupwise.vcf > phplist-emails.txt

Awk can be used as an intermediate step in a larger shell script where the output is fed into another utility, such as sort, grep, or another awk script.

Finally, here is a sample of the output:

foobar@yahoo.com, foo bar
barbaz@yahoo.com, bar baz

Where awk falls short

There are certain tasks that are beyond the capabilities of awk. For instance, if you need to do anything that communicates using network sockets, awk is not your best bet. The same is true if you need to process binary files. The latest version of GNU awk does have some rudimentary network capabilities, but Perl, PHP, and Ruby are much better equipped for those tasks.

Awk is an expert tool for text processing, and the roots of Perl are clear in its design. It is powerful enough to handle almost any kind of text crunching or reporting, while being easy to learn and use.

There are many choices when it comes to scripting languages, but I find awk the best choice for many problems. Although awk is employed most often for smaller problems, it can be used for large applications. I have worked on a 12,000-line awk application used to adjudicate dental claims. This application was the core system for a successful million dollar business. If you take the time to learn awk, the rewards will last a lifetime.

Share    Print    Comments   

Comments

on CLI Magic: Learn to talk awk

Note: Comments are owned by the poster. We are not responsible for their content.

AWK is old: use Python today!

Posted by: Anonymous Coward on January 16, 2006 07:02 PM
I have used AWK myself in the early nighties for
purposes comparable to those outlined in the
article.

But today there are better scripting languages
available! Python can do all, what you can do
with AWK (sometimes with a little more typing).

I have to maintain scripts and programs
written in AWK (some over a decade ago).

Believe me: the improved readability of programs
written in a real programming language like
Python will pay off.

Stay away from the dark side of mixed Shell, AWK and Perl programming:
<a href="http://www.python.org/doc/Humor.html#yoda" title="python.org">http://www.python.org/doc/Humor.html#yoda</a python.org>

#

Old &amp; in use = reliable

Posted by: Anonymous Coward on January 17, 2006 04:30 AM

There's nothing wrong with using awk. Where it's appropriate.

I've found that awk is stunningly fast (faster to complete a task than some "high performance" proprietary reporting tools are to initialize), and its small size (85 K for mawk, 793 K for gawk) make it a very useful tool which can be made available even in tight quarters. While it's got a relatively small feature set, it's a Turing-complete language which you can pretty much wrap your head around. The saw that Perl is a nice operating system, but what it lacks is a lightweight scripting language, has some truth to it.

If Python, Perl, Ruby, OCAML, or whatever, floats your boat and suits your needs, go for it, but no need to trash awk simply for its age.

#

Re:AWK is old: use Python today!

Posted by: Anonymous Coward on January 19, 2006 05:54 PM
AWK is NOT designed for complex programs with thousands and thousands of lines and its handling of output files is clumsy. However there are a lots of tasks who require iterating over the lines of a file and only a few dozens or hundreds lines where the many things awk does automatically (like reading the next line and detecting the end of file) make easier and faster to write Awk than Python.

Also I lost a lot of my initial enthousaism forb Python when it got a case of memory diarrhea with Python 2. Consider that the relatively simple Redhat installer (when you compare it with Suse's or Mandrake's) will not or barely work on 128 Megs even in text mode and it is the slowest Linux installer on the market.

#

Early "nighties"?

Posted by: Anonymous Coward on February 01, 2006 02:13 AM
I prefer pajamas to nighties, myself.

#

A very nice article

Posted by: Anonymous Coward on January 16, 2006 08:47 PM
This is a very nice article, I use awk every now and then (one line commands) to perform simple text manipulation, but never learned the real language. because honestly I thought it would be hard and needs time! this article demonstrated how much the language is simple and encouraged me to go for it very soon.

#

Re:A very nice article

Posted by: Anonymous Coward on January 17, 2006 11:48 AM
I think there comes a point where awk really is a hard language. Hard to maintain but also hard in the sense that you have to do some pretty arcane stuff to get it to do certain things. I studied a book on awk, and you are right - it really would take time and effort to learn all of its features. Time and effort that may or may not pay off.

awk has a different logic to it that other procedural languages do not have (it's a bit like a cross between C and sed). The article mentions all the work that awk does for you. This is all related to the logic it uses. awk is designed to read a file which is somehow structured as records of delimited fields. awk then gives you the ability to conditionally execute different blocks of code on every record. Therefore awk is worth considering for any situation that fits that pattern. Very often awk will do the job nicely. Sometimes it just gets too `awk-ward'.

Where awk excels is simple stuff like `ls -l | awk '{print $5,$9}'`. awk is also really useful inside shell scripts where a half dozen lines of indented awk code can be mixed in with shell code and sed code to great effect. I would typically use awk many times a day from the command line or in simple scripts, and 90% of it is simple one-liners. So I guess what I am saying is if you use awk at all, you are probably getting excellent value from it right now. In my case, it took me a year before I really understood what awk was doing at all (new to Unix), and since that time I have gradually extended my use of it one tiny feature at a time.

grep, sed and awk are now my bread and butter. I am glad they are there.

#

Re:Get a list of your interface addresses

Posted by: Anonymous Coward on January 16, 2006 10:12 PM
awk -F'[<nobr> <wbr></nobr>:]+' '$1 {iface=$1} iface {print iface, "=", $4;iface=""}'

#

Correction

Posted by: Anonymous Coward on January 17, 2006 12:35 PM
Your code didn't work on my machine.

This code did..<nobr> <wbr></nobr>/sbin/ifconfig | awk -F'[<nobr> <wbr></nobr>:]+' '$1 {iface=$1}<nobr> <wbr></nobr>/inet/ {print iface, "=", $4;iface=""}'

(The second test `iface' changed to `/inet/' )

#

Correction to Correction

Posted by: Anonymous Coward on February 01, 2006 02:03 AM
Your version of ifconfig is different, or else you are using awk instead of gawk.

But anyway if you are going to trigger off<nobr> <wbr></nobr>/inet/ you don't need to clear the temp variable (iface=""), so do this:

ifconfig|awk -F'[<nobr> <wbr></nobr>:]+' '$1 {iface=$1};/inet/ {printf "%s:\t%s\n",iface,$4}'

see?

#

Re:Get a list of your interface addresses

Posted by: Anonymous Coward on January 17, 2006 12:01 AM
Well, it's pretty easy in Sed:

ifconfig | sed -n '/HWaddr/{N;s/^\([^ ]*\).*inet addr:\([^ ]*\).*/\1: \2/p}'

Or if you want your loopback too, you can use

ifconfig | sed -n '/Link encap/{N;s/^\([^ ]*\).*inet addr:\([^ ]*\).*/\1: \2/p}'

These can be shortened a bit as "Link encap" can be just "Link" (or even "Li") and you don't really ned the whole "inet addr:" bit, when "addr:" or even "r:" would do.

Making a final sedlet:

ifconfig | sed -n '/HW/{N;s/^\([^ ]*\).*r:\([^ ]*\).*/\1: \2/p}'

or

ifconfig | sed -n '/Li/{N;s/^\([^ ]*\).*r:\([^ ]*\).*/\1: \2/p}'

Hurray for sed!<nobr> <wbr></nobr>:)

-gumnos

#

or this..

Posted by: Anonymous Coward on January 17, 2006 12:49 PM
<nobr> <wbr></nobr>
<tt>/sbin/ifconfig | sed -n '/Link encap/{s/<nobr> <wbr></nobr>.*//;h};/inet addr:/{s/.*addr://;s/<nobr> <wbr></nobr>.*//;x;G;s/\n/ =<nobr> <wbr></nobr>/;p}'</tt>

formatted..

<nobr> <wbr></nobr>
<tt>/sbin/ifconfig |
  sed -n '
   <nobr> <wbr></nobr>/Link encap/  {s/<nobr> <wbr></nobr>.*//;h}
   <nobr> <wbr></nobr>/inet addr:/  {
      s/.*addr://
      s/<nobr> <wbr></nobr>.*//
      x
      G
      s/\n/ =<nobr> <wbr></nobr>/
      p
    }
  '</tt>

tightly packed:

<nobr> <wbr></nobr>
<tt>/sbin/ifconfig | sed -n '/^[^ ]/{s/<nobr> <wbr></nobr>.*//;h};/dr:/{s/[^:]*://;s/<nobr> <wbr></nobr>.*//;x;G;s/\n/ =<nobr> <wbr></nobr>/;p}'</tt>

#

relief joint

Posted by: Anonymous Coward on May 28, 2006 01:48 PM
<tt>[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]
[URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]
[URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.c<nobr>o<wbr></nobr> m] Pain relief [/URL]
[URL=http://nervepainrelief.jeeran.com/pa<nobr>i<wbr></nobr> nrelief.htm] Nerve pain relief [/URL]</tt>

#

Back Pain relief

Posted by: Anonymous Coward on May 30, 2006 01:11 AM
[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]

  [URL=http://www.back.painreliefnetwork.net/lowbac<nobr>k<wbr></nobr> pain.htm] Low back pain [/URL]

  [URL=http://blog.gala.net/uploads/painreliefback/<nobr>b<wbr></nobr> ackpainrelief.htm] Back pain relief [/URL]

  [URL=http://www.weblog.ro/usercontent/13155/profi<nobr>l<wbr></nobr> es/kneepainrelief.htm] Knee pain relief [/URL]

  [URL=http://www.info.painreliefnetwork.net/Pain-R<nobr>e<wbr></nobr> lief.html] Pain relief [/URL]

  [URL=http://www.sitefights.com/community/scifi/pa<nobr>i<wbr></nobr> nrelief/painreliefpreved.htm] Pain relief [/URL]

  [URL=http://www.info.painreliefnetwork.net/Medica<nobr>t<wbr></nobr> ion-Pain-Relief.html] Medication pain relief [/URL]

  [URL=http://www.info.painreliefnetwork.net/Natura<nobr>l<wbr></nobr> -Pain-Relief.html] Natural pain relief [/URL]


  [URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]

  [URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]

  [URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.com] Pain relief [/URL]

#

Re:Get a list of your interface addresses

Posted by: Administrator on January 17, 2006 02:58 PM
Thanks - this is great !

#

Re:Get a list of your interface addresses

Posted by: Anonymous Coward on January 17, 2006 11:11 AM

I like this version:

In a shell script I would write it like this:

<nobr> <wbr></nobr>
<tt>/sbin/ifconfig |
   awk  '
     /^[! ]/     {iface=$1}
     /inet addr/ {
        split($2,addr,":")
        print iface" = "addr[2]
      }
   '</tt>

As a one-liner it looks like this:

<nobr> <wbr></nobr>
<tt>/sbin/ifconfig | awk '/^[! ]/{iface=$1}<nobr> <wbr></nobr>/inet addr/{split($2,addr,":");print iface" = "addr[2]}'</tt>

As a previous poster pointed out, don't forget `sed' when doing simple stuff. (Just don't expect to be able to maintain the code if it gets complicated)

#

Correction!

Posted by: Anonymous Coward on January 17, 2006 12:23 PM
change


  awk '/^[! ]/...'


  to


  awk '/^[^ ]/...'

#

Re:Get a list of your interface addresses

Posted by: Anonymous Coward on January 17, 2006 12:17 PM
Your example illustrates one of the `problems' with awk.

Because awk by design runs its entire program on every input line (more exactly, each record) separately, it can get `awk-ward' when you are trying to communicate information between input lines.

This is especially true when the `entity' you are interested in spans multiple input records. In these situations you are often faced with horrible warts where you need to initialise stuff in a BEGIN block, tidy up the last entity in the END block, use flags (like `change' in your example) to track state, and also be you need to be very very careful when planning the order that your blocks take, and the interaction between those blocks.

People who use ordinary structured procedural languages are used to initialising, tidying up and tracking state, but for me the beauty and power of awk is that most of the time NONE of that is neccessary. One-liners are 90% of what I would use awk for so I feel really uncomfortable when that one line suddenly bloats out with all this other stuff.

I am not saying that there is a better way for awk to handle this sort of situation, and nor am I saying that any other language handles it with more ease. Certainly I don't believe any language that handles it better than awk can compete with the simple awk one-liner. I just think that there is a point at which awk is easier, and another point at which it suddenly ceases to be easier.

#

record and field separators are redefinable

Posted by: Anonymous Coward on February 01, 2006 12:24 AM
Once you learn to use regexes for field and record separators, you can do many tasks with awk one-liners that would require far more processing power and programming skill using *any* other language.

As a (nearly always correct) rule of thumb: if there is only one input file and one output stream, use awk or sed. If there are multiple non-sequential inputs and/or outputs use something else (perl or python for rapid development, or a compiled language for fast efficient execution)

#

Python is old: use Ruby today!

Posted by: Anonymous Coward on January 17, 2006 01:00 PM
Just teasing. Python is a certainly good enough. Ruby is a just little better thought-out<nobr> <wbr></nobr>:)

But for many tasks you really do want a *targeted* language, i.e., a DSL, instead of the most all-around-powerful one, because its idioms and abstractions are designed for what you actually want to do, and AWK is quite the slickness for text-munging.

#

Re:Get a list of your interface addresses

Posted by: Anonymous Coward on February 01, 2006 12:12 AM
Using a BEGIN rule to define a regex field separator

ifconfig | awk 'BEGIN{FS="[<nobr> <wbr></nobr>:]+"};{if ($3=="addr"){printf "%s:\t%s\n",Sv,$4}else{Sv=$1}}'

Mad props to Arnold Robbins, GNU awk maintainer! gawk is quite a bit more useful than real awk, and recent versions include socket IO in a simple easy syntax.

The above is not necessarily the easiest or most concise way to solve the problem, it's off the top of my head right now.

--Charlie

#

shorter versions

Posted by: Anonymous Coward on February 01, 2006 02:11 AM
Slightly more concise if you specify the field separator as a command switch instead of with a BEGIN rule.

ifconfig | awk -F'[<nobr> <wbr></nobr>:]+' '{if ($3=="addr"){printf "%s:\t%s\n",Sv,$4}else{Sv=$1}}'

instead of

ifconfig | awk 'BEGIN{FS="[<nobr> <wbr></nobr>:]+"};{if ($3=="addr"){printf "%s:\t%s\n",Sv,$4}else{Sv=$1}}'

You could also lose the if, by using a next:

ifconfig | awk -F'[<nobr> <wbr></nobr>:]+' '/inet/{printf "%s:\t%s\n",Sv,$4;next};{Sv=$1}'

If you crunch it down any further than that you'll just end up with something unreadable to all but the illuminati. Like the average perl program!<nobr> <wbr></nobr>;)

#

Pain relief

Posted by: Anonymous Coward on May 28, 2006 01:48 PM
[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]

  [URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]

  [URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.com] Pain relief [/URL]
[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]

#

Back Pain relief

Posted by: Anonymous Coward on May 30, 2006 01:11 AM
<tt>[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]
[URL=http://www.back.painreliefnetwork.net/lowbac<nobr>k<wbr></nobr> pain.htm] Low back pain [/URL]
[URL=http://blog.gala.net/uploads/painreliefback/<nobr>b<wbr></nobr> ackpainrelief.htm] Back pain relief [/URL]
[URL=http://www.weblog.ro/usercontent/13155/profi<nobr>l<wbr></nobr> es/kneepainrelief.htm] Knee pain relief [/URL]
[URL=http://www.info.painreliefnetwork.net/Pain-R<nobr>e<wbr></nobr> lief.html] Pain relief [/URL]
[URL=http://www.sitefights.com/community/scifi/pa<nobr>i<wbr></nobr> nrelief/painreliefpreved.htm] Pain relief [/URL]
[URL=http://www.info.painreliefnetwork.net/Medica<nobr>t<wbr></nobr> ion-Pain-Relief.html] Medication pain relief [/URL]
[URL=http://www.info.painreliefnetwork.net/Natura<nobr>l<wbr></nobr> -Pain-Relief.html] Natural pain relief [/URL]

[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]
[URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]
[URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.c<nobr>o<wbr></nobr> m] Pain relief [/URL]
</tt>

#

Network in shell

Posted by: Anonymous Coward on February 23, 2006 07:34 PM
AWK doesn't have any network capabilities. And it doesn't have to have them.

Try with bash something like that:

$ awk "...whatever..."<nobr> <wbr></nobr>/dev/tcp/www.intel.com/80

I routinely use shell as dumb curl/wget replacement. It's true that Perl normally ends up being the receiving side, but awk/sed are often used there too.

Also, on systems lacking bash with all the bells'n'whistles, I find it useful netcat utility: it's the good ol' cat but instead of files it works with sockets. With netcat you can turn any non-network capable shell tool into quite networkl capable one.

Something like that:

$ ( echo -e 'GET / HTTP/1.0\r\n\r\n'; \

    while read &2 $AAA; done \

    ) | netcat www.intel.com 80

After all, it's Unix we are talking about<nobr> <wbr></nobr>;-)

#

Re:Network in shell

Posted by: Anonymous Coward on February 23, 2006 07:37 PM
Oops. People on linux.com seems do not know how to use <xmp> & <pre> tags. the &lt; aka < was eaten in the example above...

What a shame for linux site to be *NOT* shell-proof<nobr> <wbr></nobr>:-/

#

lower back pain

Posted by: Anonymous Coward on May 28, 2006 05:45 PM
[URL=http://painrelief.fanspace.com/index.htm] Pain relief [/URL]
[URL=http://lowerbackpain.0pi.com/backpain.htm] Back Pain [/URL]
[URL=http://painreliefproduct.guildspace.com] Pain relief [/URL]
[URL=http://painreliefmedic.friendpages.com] Pain relief [/URL]
[URL=http://nervepainrelief.jeeran.com/painrelief<nobr>.<wbr></nobr> htm] Nerve pain relief [/URL]

#

Re:awk - still fastest for simple stuff

Posted by: Administrator on January 26, 2006 12:44 PM
I am presently working as a software testing consultant for a large financial services company. We use Perl for quite a bit of our financial analysis ad hoc tools, and even more of our production tools are written in Perl, though our main applications are written in C++ or Java. Nevertheless, when it comes to a quick extraction of a few fields from a large data feed with well defined fields, I have found Awk to be a useful tool, even though some other responders feel that Python, Ruby, or Perl are more modern tools.

One example of where I have used Awk to quickly produce results is when I want to get a list of securities from one of the market exchanges. Several of our vendors provide online Web sites, where we can download complete lists of information about securities. From those lists, I want to get just the market symbols and the exchanges where they are traded. From there, I want to create a four part subject in various internal formats.

Here's a modified version of the kind of stuff I do, modified to protect the identity of the institution's information:

awk -f symbolExtract.awk ExchangeList > fourPartSubject

where symbolExtract.awk is my short Awk script and ExchangeList is the text file extracted from the vendor's Web site.

Here's what such a script looks like:

FS=|
print {"vendor.record.$1.$3"}

That's it. The script can go through twenty or thirty thousand records within a second or two and create a nicely formatted subject file, which I can then easily change fields one, two, or four to run through various different data feeds. I can change the fields with Awk, too, or I can change them in a Vi editor, the Sed stream editor, Emacs, or any other convenient tool. Speed, ease of change, and flexibility are all there, each of which are VERY IMPORTANT in our fast moving business.

#

Re:awk - still fastest for simple stuff

Posted by: Administrator on January 26, 2006 01:04 PM
The only problem with the script I posted, of course, is that it doesn't work as posted! The FS argument is wrong and the print statement is also syntactically wrong! The right script is two lines, though. I'll leave it as an exercise for the inquisitive to fix it...

#

Get a list of your interface addresses

Posted by: Administrator on January 16, 2006 07:23 PM
I played around, and got this so far:
<nobr> <wbr></nobr>
<tt>/sbin/ifconfig | awk '/HWaddr/ { change = 1; interface = $1 }<nobr> <wbr></nobr>/inet<nobr> <wbr></nobr>/{ change = 2; iface = $2; split( iface, ifacedet, ":" ); ifaceadr = ifacedet[2];  } { if( change == 2 ) print interface" = "ifaceadr; change = 0; }'</tt>
Output (my machine):
<tt>eth0 = 10.203.62.82
eth0 = 127.0.0.1
vmnet1 = 192.168.34.1
vmnet8 = 192.168.216.1</tt>
This is 100% correct, but I'm wondering if there's a shorter way - using awk?

#

awk - still fastest for simple stuff

Posted by: Administrator on January 17, 2006 04:45 PM
As soon as I need something more than a few lines long, I use perl or ruby, but for simple processing of data files (e.g. columns of numbers), awk is great:

awk '{print 100.0*$2/$3 }' datafile

Or, how about a little script (summ) to sum the Nth column:

awk '{summ+=$N}{print summ}' N=$1 $2

which is run with e.g. summ 3 datafile.

#

CLI Magic: Learn to talk awk

Posted by: Anonymous [ip: 66.99.50.71] on September 14, 2007 05:22 PM
html rulz yay i just cuztomized my myspace profile awesum kewl lol brb wtf lol omg stfu

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya