Basic shell utilities

21 Jun 2021

awk

awk can deal with rows and columns to do complex numerical and text extraction or manipulations. It is a best choice if you have to do some compute operation on some part of a text. Many C functions can be used with it.

awk 'BEGIN{commands} pattern {commands} END {commands} file'. BEGIN and END are optional. They are actions before process and after process, respectively.

Major variables

NR: number of current row
NF: number of fields, default delimeter is space
FNR: stores the number of records read from the current file being processed
$0: content of current row
$1: the first column of the content
OFS is Output Field Separator
-v to define a variable. (Very useful in case you want to search a pattern)
-F to specify delimiter

It works in 3 steps:

Begin with commands in BEGIN{commands}
search for texts in file that matches the pattern, and execute commands in pattern {commands} on the matched text
End with commands in END{commands}

Examples

awk '/Jeet/ {print $3}' file.csv

If the line contains Jeet, print the third field
awk 'BEGIN {getline; print $0} {s+=$3} END {print s}' file.csv

Jump the first line; calculate the sum of column 3
awk -F"," 'BEGIN{getline} min > $3 {min = $3; minline=$0} END{print minline}' file.csv:

Calculate the max of column 3 of a ‘,’ separated file; print this line
awk -v "pat=${pattern}" -F "," '$2 ~ pat{print $0}' file.csv:

Return rows of a “,” separated file (file.csv) wherever the 2nd row matches the pattern pat - which is defined by another variable pattern
awk -v "pat=${pattern}" -F "," '$2 ~ pat{print $4 " " $5}' file.csv:

Return 4th and 5th column of a “,” separated file (file.csv) wherever the 2nd row matches the pattern pat - which is defined by another variable pattern

A lot more on awk here: Click Here! Example: Two-file processing

sed

sed can print, delete and substitute text

sed [options] commands [file-to-edit]

The pattern of commands decides where and how the operations are to be performed.

options:

p = print
d = delete
s = substitute
! = to negate the range

By default, sed will print or delete or substitute according to the commands given bellow and print the output with the changes. You can direct the output to a new file. But, if you want to do these changes in the original file itself, supply -i option with the sed commands.

print

sed will do echo for matched lines by default. -n will suppress this action.

sed -n '1p' filename: Print the first line
sed -n '15,20p' filename: Print 15-20 line
sed -n '15,+9p' filename: Print 10 lines starting from line 15
sed -n '15~7p' filename: Print from line 15 to the end, except line 7

delete

sed '3d' filename: Delete the third line
sed '15,20d' filename: Delete 15-20 line
sed /error/!d filename: Delete line without error
sed /^$/d filename: Delete blank line
sed /^abc/d filename: Delete line starting with abc

substitute

sed 's/hello/This/g' filename: g, Global
sed 's/hello/This/' filename: Substitute only the first occurrance
sed 's/hello/This/2 filename: Substitute the second occurrance
sed -n 's/hello/This/2p' filename: Print the substituting lines
sed 's/hello/This/i filename: i, case insensitive
sed -e 's/hello/This/' -e 's/dear/That/' filename: Multiple sed
sed -i 44,50's/\S\+/'"JEET"'/9' filename : From line-number 44 to 50 (inclusive), substitute the 9th column of the file with “JEET”.

The line numbers, column number and substituting value can be specified with variables externally defined.

cut

cut can help to fetch some text based on column that can be used somewhere else

cut -d ',' -f1,6 filename: Get the first and the sixth columns
cut -d ':' -f5-7 filename: Get the fifth to the seventh columns with delimeter :
cut -d ',' -f2 --complement filename: Get all columns other than the 2nd

paste

paste command helps to merge files in parallel (default) or sequential

paste animal.dat sound.dat > animal_sound.dat: Joins the two files animal.dat and sound.dat side by side with a space (default) separating them as delimiter
paste -d "|" animal.dat sound.dat > animal_sound.dat : Joins the two files animal.dat and sound.dat side by side with a “|” separating them as delimiter
paste -s animal.dat sound.dat > animal_sound.dat : Joins the two files animal.dat and sound.dat one after the other in two different rows
paste -s -d ":" animal.dat sound.dat > animal_sound.dat : Joins the two files animal.dat and sound.dat one after the other in two different rows, as well as “:” as a delimiter between each elements of animal.dat and sound.dat

find

find command helps to find a file or folder from a specific location or on the entire system

find [where to start searching from]
 [expression determines what to find] [-options] [what to find]

find /sscu_gpfs/home/jeet/Install -name 'readline.h': find the file name readline.h in the directory /sscu_gpfs/home/jeet/Install
find /sscu_gpfs/home/jeet/Install -empty : Search for empty files and directories inside the folder /sscu_gpfs/home/jeet/Install
find ./jeet -perm 777 : Search for file with entered permissions 777
find ./jeet -type f -name "*.txt" -exec grep 'Geek' {} \; : This command print lines of files ending with .txt inside the directory .jeet which have ‘Geek’ in them and ‘-type f’ specifies the input type is a file.

nohup

nohup helps to run scripts in the background and separate the job from the terminal session.

chmod +x your_script.sh to make your script executable
nohup ./your_script.sh > /dev/null 2>&1 & runs the script. The > /dev/null 2>&1 redirects both standard output and standard error to /dev/null
ps aux | grep your_script.sh to get the task id.
kill <PID> or pkill -f your_script.sh to kill the job.

A cheatsheet of bash commands can be found here. This is a pretty good resource of many other cheetsheets so do check out the site!

Tricks

Show Only the n-th Line After the Match

We can use -An and -Bn options to get n lines before and after the matched context together with the matched line respectively. However, if we want to get only the n-th line after the matched context, we need to do:

grep 'Temp' --no-group-separator -A1 report.txt | grep -v 'Temp'

Here, the matching string is Temp. and we are just interested to get only the next line following the matched line. Similarly, if we wanted to get 1 line before the matched line, we would have used -B1.

The last pipeline again to grep with the option -v is to just eliminate the matched line from the final selection of lines from the first grep command. Essentially -v selects non-matching lines.

Jeet Majumdar

jeet.Log