awk with examples
awk is a widely used utility on Unix systems to manipulate data. Some common examples where you can use awk are finding the sum of all values of column 3 in a CSV file, print only 1st 3rd and 5th column of csv file etc.
Syntax:
$ awk '/search pattern1/ {Actions}
/search pattern2/ {Actions}' file
or
$ cat file | awk '/search pattern1/ {Actions}
/search pattern2/ {Actions}'
Example:
input file (abc.csv) ====================== id,name,city,salary 1,Mike,NYC,20000 2,John,Chicago,40000 3,Jerry,Orlando,50000 4,Jaime,San Francisco,100000 5,Bill,Seattle,70000 6,Sam,Chicago,80000 =========================== $ awk '/Mike/' abc.csv 1,Mike,NYC,20000 $ awk '/Mike/ /Seattle/' abc.csv 1,Mike,NYC,20000 5,Bill,Seattle,70000
above things can easily be done by egrep but we will see how awk is a lot more powerful utility.
$ awk BEGIN { Actions}
{ACTION} # Action for everyline in a file
END { Actions }
# is for comments in Awk
Now to find the average salary of people in the above-mentioned file and print it nicely we can do the following:
$awk -F, 'BEGIN {print "Salary"} {sum=sum+$4} END {print sum/NR} ' abc.csv
Salary
51428.6
-F: delimiter (Each value is represented as $1, $2, $3…). $0 represents the whole line
BEGIN: statement gets executed when the first line is processed from abc.csv
sum: is a variable and we add salary when each line is processed.
END: Block executed after evaluating the last line.
NR: Number of records processed
NF: Represents the number of fields in the line
$NF: Represents the last field in the line
$ cat aw|awk -F, '{print NF " " $NF}'
4 salary
4 20000
4 40000
4 50000
4 100000
4 70000
4 80000
in the above example, we can see NF prints number of fields in each line and $NF shows the value of the last field.
How to define a variable?
$ echo |awk '{s=4;print s}'
4
in the above example, “s” is a variable. There is no $ sign before a variable!
if statements:
Print lines where salary is >39000
cat aw|awk -F, '{if ($4>39000){print $0;}}'
id,name,city,salary
2,John,Chicago,40000
3,Jerry,Orlando,50000
4,Jaime,San Francisco,100000
5,Bill,Seattle,70000
6,Sam,Chicago,80000
How to pass shell variables/ environment variables?
$ variable=test
$ echo |awk -v var=$variable '{print var}'
test
declare a variable with -v. Remember awk variables don’t have a $ sign in the beginning.
How about multiple variables?
$ echo |awk -v var=$variable -v var2="g" '{print var " " var2}'
test g
Regex in your search patterns:
$ cat abc.csv|awk '/ea|Sam|^2/' 2,John,Chicago,40000 5,Bill,Seattle,70000 6,Sam,Chicago,80000
How to run awk script?
$ awk -f script.awk input_file #script.awk is awk script #input_file is a file with data
How to write an awk script in a file?
#!/usr/bin/awk -f
var2="this is a variable";
{print var2;}
run?
$ echo | ./script.awk this is a variable
Calculate the length(number of characters) of each line?
$ cat abc.csv |awk '{print length($1)}'
19
16
20
21
11
20
19
How about print lines where the length of each line is >=20 characters?
$ cat aw |awk '{ if(length($1) >=20) {print $1}}'
2,John,Chicago,40000
3,Jerry,Orlando,50000
5,Bill,Seattle,70000
Math operators in awk?
+ addition operator
– subtraction operator
/ division operator
* multiplication operator
% modulus operator
^ exponentiation operator
example (exponential operator)
$ echo "5" |awk '{print 2^$1}'
32
Loops in awk?
for loop example:
$ echo |awk '{for (i=0; i<10; i++) print i}'
0
1
2
3
4
5
6
7
8
9
Similarly, you can have a while loop.
awk arrays (actually associative arrays)
$ awk 'BEGIN {for (i=0; i<10; i++) x[i]=i*i
print x[4]
}'
16
you can assign values to array arr[key]=value. You cannot print an array with a single operation you need to know the keys and iterate.
you can delete any element by delete arr[key]