awk with examples
awk is a widely used utility on Unix systems to manipulate data. Some common examples where you can use awk are finding the sum of all values of column 3 in a CSV file, print only 1st 3rd and 5th column of csv file etc.
Syntax:
$ awk '/search pattern1/ {Actions} /search pattern2/ {Actions}' file
or
$ cat file | awk '/search pattern1/ {Actions} /search pattern2/ {Actions}'
Example:
input file (abc.csv) ====================== id,name,city,salary 1,Mike,NYC,20000 2,John,Chicago,40000 3,Jerry,Orlando,50000 4,Jaime,San Francisco,100000 5,Bill,Seattle,70000 6,Sam,Chicago,80000 =========================== $ awk '/Mike/' abc.csv 1,Mike,NYC,20000 $ awk '/Mike/ /Seattle/' abc.csv 1,Mike,NYC,20000 5,Bill,Seattle,70000
above things can easily be done by egrep but we will see how awk is a lot more powerful utility.
$ awk BEGIN { Actions} {ACTION} # Action for everyline in a file END { Actions } # is for comments in Awk
Now to find the average salary of people in the above-mentioned file and print it nicely we can do the following:
$awk -F, 'BEGIN {print "Salary"} {sum=sum+$4} END {print sum/NR} ' abc.csv Salary 51428.6
-F: delimiter (Each value is represented as $1, $2, $3…). $0 represents the whole line
BEGIN: statement gets executed when the first line is processed from abc.csv
sum: is a variable and we add salary when each line is processed.
END: Block executed after evaluating the last line.
NR: Number of records processed
NF: Represents the number of fields in the line
$NF: Represents the last field in the line
$ cat aw|awk -F, '{print NF " " $NF}' 4 salary 4 20000 4 40000 4 50000 4 100000 4 70000 4 80000
in the above example, we can see NF prints number of fields in each line and $NF shows the value of the last field.
How to define a variable?
$ echo |awk '{s=4;print s}' 4
in the above example, “s” is a variable. There is no $ sign before a variable!
if statements:
Print lines where salary is >39000
cat aw|awk -F, '{if ($4>39000){print $0;}}' id,name,city,salary 2,John,Chicago,40000 3,Jerry,Orlando,50000 4,Jaime,San Francisco,100000 5,Bill,Seattle,70000 6,Sam,Chicago,80000
How to pass shell variables/ environment variables?
$ variable=test $ echo |awk -v var=$variable '{print var}' test
declare a variable with -v. Remember awk variables don’t have a $ sign in the beginning.
How about multiple variables?
$ echo |awk -v var=$variable -v var2="g" '{print var " " var2}' test g
Regex in your search patterns:
$ cat abc.csv|awk '/ea|Sam|^2/' 2,John,Chicago,40000 5,Bill,Seattle,70000 6,Sam,Chicago,80000
How to run awk script?
$ awk -f script.awk input_file #script.awk is awk script #input_file is a file with data
How to write an awk script in a file?
#!/usr/bin/awk -f var2="this is a variable"; {print var2;}
run?
$ echo | ./script.awk this is a variable
Calculate the length(number of characters) of each line?
$ cat abc.csv |awk '{print length($1)}' 19 16 20 21 11 20 19
How about print lines where the length of each line is >=20 characters?
$ cat aw |awk '{ if(length($1) >=20) {print $1}}' 2,John,Chicago,40000 3,Jerry,Orlando,50000 5,Bill,Seattle,70000
Math operators in awk?
+ addition operator
– subtraction operator
/ division operator
* multiplication operator
% modulus operator
^ exponentiation operator
example (exponential operator)
$ echo "5" |awk '{print 2^$1}' 32
Loops in awk?
for loop example:
$ echo |awk '{for (i=0; i<10; i++) print i}' 0 1 2 3 4 5 6 7 8 9
Similarly, you can have a while loop.
awk arrays (actually associative arrays)
$ awk 'BEGIN {for (i=0; i<10; i++) x[i]=i*i print x[4] }' 16
you can assign values to array arr[key]=value. You cannot print an array with a single operation you need to know the keys and iterate.
you can delete any element by delete arr[key]