Basic shell utilities
21 Jun 2021awk
awk can deal with rows and columns to do complex numerical and text extraction or manipulations. It is a best choice if you have to do some compute operation on some part of a text. Many C functions can be used with it.
awk 'BEGIN{commands} pattern {commands} END {commands} file'. BEGIN and END are optional. They are actions before process and after process, respectively.
Major variables
NR: number of current rowNF: number of fields, default delimeter is spaceFNR: stores the number of records read from the current file being processed$0: content of current row$1: the first column of the contentOFSis Output Field Separator-vto define a variable. (Very useful in case you want to search a pattern)-Fto specify delimiter
It works in 3 steps:
- Begin with
commandsinBEGIN{commands} - search for texts in
filethat matches thepattern, and execute commands inpattern {commands}on the matched text - End with
commandsinEND{commands}
Examples
-
awk '/Jeet/ {print $3}' file.csvIf the line contains Jeet, print the third field
-
awk 'BEGIN {getline; print $0} {s+=$3} END {print s}' file.csvJump the first line; calculate the sum of column 3
-
awk -F"," 'BEGIN{getline} min > $3 {min = $3; minline=$0} END{print minline}' file.csv:Calculate the max of column 3 of a ‘,’ separated file; print this line
-
awk -v "pat=${pattern}" -F "," '$2 ~ pat{print $0}' file.csv:Return rows of a “,” separated file (file.csv) wherever the 2nd row matches the pattern
pat- which is defined by another variablepattern -
awk -v "pat=${pattern}" -F "," '$2 ~ pat{print $4 " " $5}' file.csv:Return 4th and 5th column of a “,” separated file (file.csv) wherever the 2nd row matches the pattern
pat- which is defined by another variablepattern
A lot more on awk here: Click Here! Example: Two-file processing
sed
sed can print, delete and substitute text
sed [options] commands [file-to-edit]
The pattern of commands decides where and how the operations are to be performed.
options:
- p = print
- d = delete
- s = substitute
- ! = to negate the range
By default, sed will print or delete or substitute according to the commands given bellow and print the output with the changes. You can direct the output to a new file. But, if you want to do these changes in the original file itself, supply -i option with the sed commands.
sed will do echo for matched lines by default. -n will suppress this action.
sed -n '1p' filename: Print the first linesed -n '15,20p' filename: Print 15-20 linesed -n '15,+9p' filename: Print 10 lines starting from line 15sed -n '15~7p' filename: Print from line 15 to the end, except line 7
delete
-
sed '3d' filename: Delete the third line -
sed '15,20d' filename: Delete 15-20 line -
sed /error/!d filename: Delete line without error sed /^$/d filename: Delete blank linesed /^abc/d filename: Delete line starting withabc
substitute
-
sed 's/hello/This/g' filename:g, Global -
sed 's/hello/This/' filename: Substitute only the first occurrance -
sed 's/hello/This/2 filename: Substitute the second occurrance -
sed -n 's/hello/This/2p' filename: Print the substituting lines -
sed 's/hello/This/i filename:i, case insensitive -
sed -e 's/hello/This/' -e 's/dear/That/' filename: Multiple sed -
sed -i 44,50's/\S\+/'"JEET"'/9' filename: From line-number 44 to 50 (inclusive), substitute the 9th column of the file with “JEET”.The line numbers, column number and substituting value can be specified with variables externally defined.
cut
cut can help to fetch some text based on column that can be used somewhere else
cut -d ',' -f1,6 filename: Get the first and the sixth columnscut -d ':' -f5-7 filename: Get the fifth to the seventh columns with delimeter:cut -d ',' -f2 --complement filename: Get all columns other than the 2nd
paste
paste command helps to merge files in parallel (default) or sequential
paste animal.dat sound.dat > animal_sound.dat: Joins the two files animal.dat and sound.dat side by side with a space (default) separating them as delimiterpaste -d "|" animal.dat sound.dat > animal_sound.dat: Joins the two files animal.dat and sound.dat side by side with a “|” separating them as delimiterpaste -s animal.dat sound.dat > animal_sound.dat: Joins the two files animal.dat and sound.dat one after the other in two different rowspaste -s -d ":" animal.dat sound.dat > animal_sound.dat: Joins the two files animal.dat and sound.dat one after the other in two different rows, as well as “:” as a delimiter between each elements of animal.dat and sound.dat
find
find command helps to find a file or folder from a specific location or on the entire system
find [where to start searching from]
[expression determines what to find] [-options] [what to find]
find /sscu_gpfs/home/jeet/Install -name 'readline.h': find the file name readline.h in the directory /sscu_gpfs/home/jeet/Installfind /sscu_gpfs/home/jeet/Install -empty: Search for empty files and directories inside the folder /sscu_gpfs/home/jeet/Installfind ./jeet -perm 777: Search for file with entered permissions 777find ./jeet -type f -name "*.txt" -exec grep 'Geek' {} \;: This command print lines of files ending with .txt inside the directory .jeet which have ‘Geek’ in them and ‘-type f’ specifies the input type is a file.
nohup
nohup helps to run scripts in the background and separate the job from the terminal session.
chmod +x your_script.shto make your script executablenohup ./your_script.sh > /dev/null 2>&1 &runs the script. The> /dev/null 2>&1redirects both standard output and standard error to /dev/nullps aux | grep your_script.shto get the task id.kill <PID>orpkill -f your_script.shto kill the job.
A cheatsheet of bash commands can be found here. This is a pretty good resource of many other cheetsheets so do check out the site!
Tricks
Show Only the n-th Line After the Match
We can use -An and -Bn options to get n lines before and after the matched context together with the matched line respectively.
However, if we want to get only the n-th line after the matched context, we need to do:
grep 'Temp' --no-group-separator -A1 report.txt | grep -v 'Temp'
Here, the matching string is Temp. and we are just interested to get only the next line following the matched line.
Similarly, if we wanted to get 1 line before the matched line, we would have used -B1.
The last pipeline again to grep with the option -v is to just eliminate the matched line from the final selection of lines from the first grep command. Essentially -v selects non-matching lines.