awk with examples


awk is a widely used utility on Unix systems to manipulate data. Some common examples where you can use awk are finding the sum of all values of column 3 in a CSV file, print only 1st 3rd and 5th column of csv file etc.

Syntax:

$ awk '/search pattern1/ {Actions}
     /search pattern2/ {Actions}' file

or

$ cat file | awk '/search pattern1/ {Actions} 
   /search pattern2/ {Actions}'

 

Example:

input file (abc.csv)
======================
id,name,city,salary
1,Mike,NYC,20000
2,John,Chicago,40000
3,Jerry,Orlando,50000
4,Jaime,San Francisco,100000
5,Bill,Seattle,70000
6,Sam,Chicago,80000
===========================

$ awk '/Mike/' abc.csv
1,Mike,NYC,20000

$ awk '/Mike/ 
 /Seattle/' abc.csv
1,Mike,NYC,20000
5,Bill,Seattle,70000

above things can easily be done by egrep but we will see how awk is a lot more powerful utility.

$ awk BEGIN { Actions}
{ACTION} # Action for everyline in a file
END { Actions }
# is for comments in Awk

Now to find the average salary of people in the above-mentioned file and print it nicely we can do the following:

$awk -F, 'BEGIN  {print "Salary"} {sum=sum+$4} END {print sum/NR} ' abc.csv

Salary
51428.6

-F: delimiter (Each value is represented as $1, $2, $3…). $0 represents the whole line

BEGIN: statement gets executed when the first line is processed from abc.csv

sum: is a variable and we add salary when each line is processed.

END: Block executed after evaluating the last line.

NR: Number of records processed

NF: Represents the number of fields in the line

$NF: Represents the last field in the line

$ cat aw|awk -F, '{print NF "  " $NF}'
4  salary
4  20000
4  40000
4  50000
4  100000
4  70000
4  80000

in the above example, we can see NF prints number of fields in each line and $NF shows the value of the last field.

 


How to define a variable?

$ echo |awk '{s=4;print s}'
4

in the above example, s is a variable.

 


if statements:

Print lines where salary is >39000

cat aw|awk -F, '{if ($4>39000){print $0;}}'
id,name,city,salary
2,John,Chicago,40000
3,Jerry,Orlando,50000
4,Jaime,San Francisco,100000
5,Bill,Seattle,70000
6,Sam,Chicago,80000

 


How to pass shell variables/ environment variables?

$ variable=test
$ echo |awk -v var=$variable '{print var}'
test

declare a variable with  -v. Remember awk variables don’t have $ sign in the beginning.

How about multiple variables?

$ echo |awk -v var=$variable  -v var2="g" '{print var "  " var2}'
test  g

 


Regex in your search patterns:

$ cat abc.csv|awk '/ea|Sam|^2/'
2,John,Chicago,40000
5,Bill,Seattle,70000
6,Sam,Chicago,80000

 


How to run awk script?

$ awk -f script.awk input_file

#script.awk is awk script
#input_file is a file with data

 


How to write awk script in a file?

#!/usr/bin/awk -f 
var2="this is a variable";
{print  var2;}

run?

$ echo | ./script.awk 

this is a variable

 


Calculate the length(number of characters) of each line?

$ cat abc.csv |awk '{print length($1)}'
19
16
20
21
11
20
19

How about print lines where the length of each line is >=20 characters?

$ cat aw |awk '{ if(length($1) >=20) {print $1}}'
2,John,Chicago,40000
3,Jerry,Orlando,50000
5,Bill,Seattle,70000

 


Math operators in awk?

+   addition operator
–    subtraction operator
/    division operator
*   multiplication operator
%  modulus operator
^   exponentiation operator

example (exponential operator)

$ echo "5" |awk '{print 2^$1}'
32

 


Loops in awk?

for loop example:

$ echo |awk '{for (i=0; i<10; i++) print i}'
0
1
2
3
4
5
6
7
8
9

similarily you can have while loop.

 


awk arrays (actually associative arrays)

$ awk 'BEGIN {for (i=0; i<10; i++)  x[i]=i*i
print x[4]
}'
16

you can assign values to array arr[key]=value. You cannot print array with a single operation you need to know the keys and iterate.

you can delete any element by delete arr[key]

Categories