Introduction: Introduce how to work with, view, find and search the content of text files. The workshop is aimed at beginners with basic command-line experience of the Linux file system and will focus on hands-on exercises.
Course Goals
View the content of text-based files.
Search a file for a string.
Search for a file/folder by name.
Redirect output from commands and pipe commands together.
Be very exercise based to allow practice of commands and concepts.
01 View/Search a File
01.01 Setting Up
Question:
There is a folder called
intro_to_linux
within the/project/arccatrain/
folder.How would you copy this folder into your home folder?
01.02 Setting Up: Answer(s)
Answer: There are a number of ways…
# Move to your home folder and copy int this location. # The “.” marks the current working directory. [arccanetrain]$ cd [~]$ cp -r /project/arccanetrain/intro_to_linux/ . cp: cannot open 'intro_to_linux/workshop_me.txt' for reading: Permission denied [~]$ ls Desktop Documents Downloads intro_to_linux # Move into the /project/arccatrain/ folder and copy from there into your home. # The “~” is short for your home folder. [~]$ cd /project/arccanetrain/ [arccanetrain]$ cp -r intro_to_linux/ ~ cp: cannot open 'intro_to_linux/workshop_me.txt' for reading: Permission denied [arccanetrain]$ ls ~ Desktop Documents Downloads intro_to_linux # Why do we see the cp related permission denied? # -rw------- 1 arcc-t05 arccanetrain 23 Oct 5 07:20 workshop_me.txt # What happens to this file? It does not get copied.
01.03a View the content of files.
Command | Description |
cat | Usage: cat [OPTION]... [FILE]... Concatenate FILE(s) to standard output. -n, --number number all output lines |
more | more [options] <file>... A file perusal filter for CRT viewing. more is a filter for paging through text one screenful at a time. |
head | Usage: head [OPTION]... [FILE]... Print the first 10 lines of each FILE to standard output. -n, --lines=[-]NUM print the first NUM lines instead of the first 10; with the leading '-', print all but the last NUM lines of each file |
01.03b View the content of files.
Command | Description |
tail | Usage: tail [OPTION]... [FILE]... Print the last 10 lines of each FILE to standard output. -f, --follow[={name|descriptor}] output appended data as the file grows; an absent option argument means 'descriptor’ -n, --lines=[+]NUM output the last NUM lines, instead of the last 10; or use -n +NUM to output starting with line NUM |
01.04 Exercises
[]$ cd ~/intro_to_linux/ [intro_to_linux]$ cat software.csv [intro_to_linux]$ cat -n software.csv # Press spacebar to scroll through. # Press ‘q’ to quit at any time. [intro_to_linux]$ more software.csv [intro_to_linux]$ head software.csv [intro_to_linux]$ head –n 5 software.csv [intro_to_linux]$ tail software.csv [intro_to_linux]$ tail –n 5 software.csv
01.05 Search for a string within a text file (grep)
Command | Description |
grep | Usage: grep [OPTION]... PATTERN [FILE]... Search for PATTERN in each FILE. Example: grep -i 'hello world' menu.h main.c ... -i, --ignore-case ignore case distinctions ... -n, --line-number print line number with output lines ... -r, --recursive like --directories=recurse ... # grep is case-sensitive |
01.06 Examples: Search a file:
# Remember: grep is case-sensitive [intro_to_linux]$ grep NVIDIA software.csv libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep... libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an... # Nothing is returned. [intro_to_linux]$ grep nvidia software.csv [intro_to_linux]$ # Neither of the above picked up “Nvidia”. [intro_to_linux]$ grep -i NVidia software.csv compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"... libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep... libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an... # Ignore the case of the word to search for. [intro_to_linux]$ grep -n -i NVidia software.csv 145:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"... 152:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep... 166:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...
01.07 Examples: Search folders and files:
[intro_to_linux]$ cd clusters/ [clusters]$ grep -i nvidia * beartooth.html: .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td> teton.html: .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td> [clusters]$ cd .. [intro_to_linux]$ grep -i nvidia * grep: clusters: Is a directory software.csv:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"... software.csv:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep... software.csv:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an... [intro_to_linux]$ grep -r -i nvidia * clusters/teton.html: .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td> clusters/beartooth.html: .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td> software.csv:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"... software.csv:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep... software.csv:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...
01.08 Exercises
# The software.csv file takes the form: [intro_to_linux]$ head software.csv Type,Name,Module,Cluster,Description application,Alphafold,alphafold,"beartooth,teton",AlphaFold... application,Astral,astral,wildiris,ASTRAL is a tool... application,Augustus,augustus,beartooth,AUGUSTUS is a program... application,Avizo,avizo,loren-pre202308,Avizo is a general-purpose... application,ANGSD,angsd,"beartooth,teton",ANGSD: is a software... application,ANSYS,ansys,teton,"ANSYS is a general-purpose software...
Questions:
Which named applications are related to the words “bayes”?
Which files contain reference to IPA?
01.09 Answers
[intro_to_linux]$ grep -i bayes software.csv application,Bayescan,bayescan,beartooth,"BayeScan aims... application,Beast1,beast1,wildiris,BEAST is a cross-platform program for Bayesian... application,Beast2,beast2,beartooth,"BEAST 2 is a cross-platform program for Bayesian... application,Freebayes,freebayes,beartooth,"freebayes is a Bayesian genetic... application,Jags,jags,"beartooth,teton",Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical... application,RevBayes,revbayes,wildiris,Bayesian phylogenetic... application,ROHan,rohan,teton,"ROHan is a Bayesian framework... application,SourceTracker2,sourcetracker2,"beartooth,teton","SourceTracker, a Bayesian approach... [intro_to_linux]$ grep -r IPA * clusters/beartooth.html: .../pages/1893597185/IPA">IPA</a></td> software.csv:application,IPA,ipa,beartooth,Improved Phased Assembler (IPA) is...
02 Search for a File
02.01a Searching for Files: find
Let us look at a folder with many subfolders and files.
[]$ cd ~/intro_to_linux [intro_to_linux]$ ls clusters data software.csv [intro_to_linux]$ ls -R .: clusters data Intro_to_linux.pdf software.csv vegatables.txt workshop_all.txt workshop_me.txt ./clusters: beartooth.html loren.html teton.html wildiris.html ./data: 2021 2022 2023 dd.tx ./data/2021: Apr Nov Sep ./data/2021/Apr: 20210403.txt 20210427.txt 20210428.txt
02.01b Searching for Files: find
./data/2021/Nov: 20211114.txt 20211115.txt 20211116.txt hello.txt ./data/2021/Sep: 20210908.txt 20210921.txt ./data/2022: Dec Feb Hello.csv Jul Jun ./data/2022/Dec: 20221207.txt 20221220.txt 20221230.txt 20221231.txt ./data/2022/Feb: 20220203.txt 20220223.txt ./data/2022/Jul: 20220720.txt 20220722.txt 20220723.TX ...
02.02 Searching for Files: find
Command | Description |
find | Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] default path is the current directory; default expression is -print expression may consist of: operators, options, tests, and actions: ... EXPRESSION The part of the command line after the list of starting points is the expression. This is a kind of query specification describing how we match files and what we do ... TESTS ... -name pattern Base of file name (the path with the leading directories removed) matches shell pattern pattern. ... -iname pattern Like -name, but the match is case insensitive. ... |
02.03 Examples
[]$ cd ~/intro_to_linux/ [intro_to_linux]$ find . -name 20230121.txt ./data/2023/Jan/20230121.txt # Check that this file is within the returned location. [intro_to_linux]$ ls data/2023/Jan/ 20230102.txt 20230108.txt 20230115.txt 20230121.txt # Nothing returned – no file exists called “20230120.txt” [intro_to_linux]$ find . -name 20230120.txt [arcc-t05@blog1 intro_to_linux]$ [intro_to_linux]$ find . -name README.txt ./data/2021/README.txt # find is case–sensitive: use –iname option [intro_to_linux]$ find . -iname README.txt ./data/2021/README.txt ./data/2022/readme.txt ./data/2023/ReadMe.txt
02.04 Examples
# Use wildcards to find all files with the postfix .csv: [intro_to_linux]$ find . -name "*.csv" ./software.csv ./data/2022/Hello.csv # Find any files/folders that contain the string “dec” [arcc-t05@blog1 intro_to_linux]$ find . -name "*dec*" ./data/2022/Dec/2022_dec_01.txt [arcc-t05@blog1 intro_to_linux]$ find . -iname "*dec*" ./data/2022/Dec ./data/2022/Dec/2022_dec_01.txt # Find only folders. [arcc-t05@blog1 intro_to_linux]$ find . -type d -iname "*dec*" ./data/2022/Dec # Find only files [arcc-t05@blog1 intro_to_linux]$ find . -type f -iname "*dec*" ./data/2022/Dec/2022_dec_01.txt
02.05 Exercises
Questions:
What do we notice about some of the
find
command options?Find any files that contain the string “
hello
”, regardless of case, within their filename.Find any folders or files that contain the string “
feb
” regardless of case.Can you list only the folders?
Find any files that have the postfix “
tx
” – must be lowercase.
02.06 Answers
1: What do we notice about some of the find command options?
That some of the single dash options (
-name
) are similar to long-names and not single letters.
2: Find any files that contain the string “hello
”, regardless of case, within their filename.
[intro_to_linux]$ find . -name "hello“ [intro_to_linux]$ find . -name "hello.*" ./data/2021/Nov/hello.txt [intro_to_linux]$ find . -iname "hello.*" ./data/2021/Nov/hello.txt ./data/2022/Hello.csv ./data/2023/Mar/HELLO.txt
02.07 Answers
3: Find any folders or files that contain the string “feb
” regardless of case.
Can you list only the folders?
[intro_to_linux]$ find . -name feb ./data/2021/feb [intro_to_linux]$ find . -iname feb ./data/2021/feb ./data/2023/Feb [intro_to_linux]$ find . -iname "*feb*" ./data/2021/feb ./data/2021/feb/february_01_2021.tx ./data/2022/February ./data/2023/Feb [intro_to_linux]$ find . -type d -iname "*feb*" ./data/2021/feb ./data/2022/February ./data/2023/Feb
02.08a Answers
4: Find any files that have the postfix “tx
” – must be lowercase.
[intro_to_linux]$ find . -name "tx" [intro_to_linux]$ find . -name "*tx*" ./data/2021/README.txt ./data/2021/Nov/20211115.txt ./data/2021/Nov/hello.txt ./data/2021/Nov/20211114.txt … [intro_to_linux]$ find . -name "*tx" ./data/dd.tx ./data/2021/feb/february_01_2021.tx ./data/2023/Jan/texttx [intro_to_linux]$ find . -name "*.tx" ./data/dd.tx ./data/2021/feb/february_01_2021.tx
02.08a Answers
# dd.tx is actually a folder. # Notices the ’d’ in the long format list. [intro_to_linux]$ ls -l data total 4 drwxrwxr-x 6 arcc-t05 arcc-t05 2021 drwxrwxr-x 6 arcc-t05 arcc-t05 2022 drwxrwxr-x 5 arcc-t05 arcc-t05 2023 drwxrwxr-x 2 arcc-t05 arcc-t05 dd.tx [intro_to_linux]$ find . -type f -name "*.tx" ./data/2021/feb/february_01_2021.tx # We explicitly want lowercase. [intro_to_linux]$ find . -type f -iname "*.tx" ./data/2021/feb/february_01_2021.tx ./data/2022/20220723.TX
03 Output Redirection and Pipes
06 Output Redirection and Pipes
Redirection of output: > vs >>
A redirect sends a channel of output to a file.
You can redirect a file as input to a command using
<
and<<
(not looked at).
Using pipe “|’
A pipe passes standard output as the standard input to another command
Examples of the form:
View a text file and pipe to grep.
Cat a list and sort by line.
Sort and then find unique items.
View folder contents and look for a specifically named name.
06.01 Redirection of output: > vs >>
# Writes out to the command line. [intro_to_linux]$ grep -i bayes software.csv # Redirects the output to a file called apps.txt [intro_to_linux]$ grep -i bayes software.csv > apps.txt [intro_to_linux]$ ls apps.txt clusters data software.csv [intro_to_linux]$ cat apps.txt # Overwrites any existing file called apps.txt [intro_to_linux]$ grep -i IPA software.csv > apps.txt [intro_to_linux]$ cat apps.txt [intro_to_linux]$ rm apps.txt # Overwrites existing apps.txt [intro_to_linux]$ grep -i bayes software.csv > apps.txt # Appends to the existing file. [intro_to_linux]$ grep -i IPA software.csv >> apps.txt
06.02 Example: Using pipe “|” from a file.
[intro_to_linux]$ cat fruits.txt Gooseberry Apple Apricot Avocado Strawberry ... [intro_to_linux]$ cat fruits.txt | wc -l 97
06.03 Example continued:
# The order of items is the same as listed within the fruits.txt file. [intro_to_linux]$ cat fruits.txt | grep berry Gooseberry Strawberry Bilberry Blackberry Marionberry Blueberry Boysenberry Gooseberry Cloudberry Elderberry Goji berry Honeyberry Juniper berry Cranberry Cranberry Marionberry Gooseberry Mulberry Salmonberry Huckleberry Raspberry Salal berry
06.04 Example continued:
# Notice the duplicates. [intro_to_linux]$ cat fruits.txt | grep berry | sort Bilberry Blackberry Blueberry Boysenberry Cloudberry Cranberry Cranberry Elderberry Goji berry Gooseberry Gooseberry Gooseberry Honeyberry Huckleberry Juniper berry Marionberry Marionberry Mulberry Raspberry Salal berry Salmonberry Strawberry
06.05 Example continued:
# Duplicates have been removed leaving only the unique names. [intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq Bilberry Blackberry Blueberry Boysenberry Cloudberry Cranberry Elderberry Goji berry Gooseberry Honeyberry Huckleberry Juniper berry Marionberry Mulberry Raspberry Salal berry Salmonberry Strawberry [intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq | wc –l 18
06.06 Example continued
[intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq > berries.txt [intro_to_linux]$ cat berries.txt Bilberry Blackberry Blueberry Boysenberry Cloudberry Cranberry Elderberry Goji berry Gooseberry Honeyberry Huckleberry Juniper berry Marionberry Mulberry Raspberry Salal berry Salmonberry Strawberry [intro_to_linux]$ cat berries.txt | wc -l 18
06.07 Example: Pipe from ls command
[intro_to_linux]$ ls -R [intro_to_linux]$ ls -R | grep "Feb" February ./data/2022/February: Feb ./data/2023/Feb: [intro_to_linux]$ ls -R | grep -i "Feb" feb ./data/2021/feb: february_01_2021.tx February ./data/2022/February: Feb ./data/2023/Feb:
06.08 Exercises
How does the
wc
command work? What are its options?How does the
sort
command work? What are its options?How does the
uniq
command work? What are its options?How many unique varieties of beans are there in the
vegetables.txt
file?
06.09 Answers
4: How many unique varieties of beans ae there in the vegetables.txt file?
How do you deal with “soy beans” vs “Soy Beans”?
What options does the
uniq
command provide?
[intro_to_linux]$ cat vegatables.txt | grep -i beans | sort | uniq -i | wc -l 12
07 More Intermediate Features, Next Steps, Suggestions and Summary
07.01a More Intermediate Features
Environment Variables: Define the behavior of the environment: Try:
echo $HOME
echo $USER
echo $SHELL
echo $PATH
File searching/manipulation
sed: stream editor for filtering and transforming text
gawk: pattern scanning and processing language
Ability to update file permission and ownership:
chmod
/chown
User-case of sharing files/folders.
07.01b More Intermediate Features
Aliases in
.bashrc
.Create short-cuts of popular/frequently used commands.
Text editors: vi/vim/nano
vimtutor
touch
Remote access with ssh.
07.02 Next Steps, Suggestions
Next Steps on using Linux:
Practicing using Linux online.
Dual boot a Windows machine with Linux.
Run a container image.
UW Researcher? Create a project on the Beartooth cluster with your PI.
07.03 Further Trainings: UWYO LinkedIn
Introduction to Linux
Learning Linux Command Line
Linux: Files and Permissions
Linux: Over and Installation
Learning Linux Shell Scripting
07.04 Request an Account with ARCC
Wiki: https://arccwiki.atlassian.net/wiki/spaces/DOCUMENTAT/overview
Portal: https://arccwiki.atlassian.net/servicedesk/customer/portals
07.05 Summary
In this workshop we have:
How to search for a string within a file.
How to find a file.
How to redirect the output of a command into a file.
How to use pipes to direct the output of one command as the input into another command.