Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Current »

Introduction: Introduce how to work with, view, find and search the content of text files. The workshop is aimed at beginners with basic command-line experience of the Linux file system and will focus on hands-on exercises.

Course Goals

  • View the content of text-based files.

  • Search a file for a string. 

  • Search for a file/folder by name.

  • Redirect output from commands and pipe commands together. 

  • Be very exercise based to allow practice of commands and concepts.



01 View/Search a File

01.01 Setting Up

Question:

  • There is a folder called intro_to_linux within the /project/arccatrain/ folder.

  • How would you copy this folder into your home folder?


01.02 Setting Up: Answer(s)

Answer: There are a number of ways…

# Move to your home folder and copy int this location.
# The “.” marks the current working directory.
[arccanetrain]$ cd 
[~]$ cp -r /project/arccanetrain/intro_to_linux/ .
cp: cannot open 'intro_to_linux/workshop_me.txt' for reading: Permission denied
[~]$ ls
Desktop  Documents  Downloads  intro_to_linux

# Move into the /project/arccatrain/ folder and copy from there into your home.
# The “~” is short for your home folder.
[~]$ cd /project/arccanetrain/
[arccanetrain]$ cp -r intro_to_linux/ ~
cp: cannot open 'intro_to_linux/workshop_me.txt' for reading: Permission denied
[arccanetrain]$ ls ~
Desktop  Documents  Downloads  intro_to_linux

# Why do we see the cp related permission denied?
# -rw-------  1 arcc-t05 arccanetrain      23 Oct  5 07:20  workshop_me.txt
# What happens to this file? It does not get copied.

01.03a View the content of files.

Command

Description

cat

Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

 -n, --number
              number all output lines

more

more [options] <file>...
A file perusal filter for CRT viewing.
more  is a filter for paging through text one screenful at a time.

head

Usage: head [OPTION]... [FILE]...
Print the first 10 lines of each FILE to standard output.

 -n, --lines=[-]NUM       print the first NUM lines instead of the first 10;
                             with the leading '-', print all but the last
                             NUM lines of each file

01.03b View the content of files.

Command

Description

tail

Usage: tail [OPTION]... [FILE]...
Print the last 10 lines of each FILE to standard output.

 -f, --follow[={name|descriptor}]
                           output appended data as the file grows;
                             an absent option argument means 'descriptor’

 -n, --lines=[+]NUM       output the last NUM lines, instead of the last 10;
                             or use -n +NUM to output starting with line NUM

01.04 Exercises

[]$ cd ~/intro_to_linux/
[intro_to_linux]$ cat software.csv

[intro_to_linux]$ cat -n software.csv

# Press spacebar to scroll through.
# Press ‘q’ to quit at any time.
[intro_to_linux]$ more software.csv

[intro_to_linux]$ head software.csv

[intro_to_linux]$ head –n 5 software.csv

[intro_to_linux]$ tail software.csv

[intro_to_linux]$ tail –n 5 software.csv

01.05 Search for a string within a text file (grep) 

Command

Description

grep

Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE.
Example: grep -i 'hello world' menu.h main.c
...
 -i, --ignore-case         ignore case distinctions
...
 -n, --line-number         print line number with output lines
...
 -r, --recursive           like --directories=recurse
...

# grep is case-sensitive

01.06 Examples: Search a file:

# Remember: grep is case-sensitive
[intro_to_linux]$ grep NVIDIA software.csv
libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep...
libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...

# Nothing is returned.
[intro_to_linux]$ grep nvidia software.csv
[intro_to_linux]$

# Neither of the above picked up “Nvidia”.
[intro_to_linux]$ grep -i NVidia software.csv
compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"...
libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep...
libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...

# Ignore the case of the word to search for.
[intro_to_linux]$ grep -n -i NVidia software.csv
145:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"...
152:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep...
166:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...

01.07 Examples: Search folders and files:

[intro_to_linux]$ cd clusters/
[clusters]$ grep -i nvidia *
beartooth.html:    .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td>
teton.html:    .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td>

[clusters]$ cd ..
[intro_to_linux]$ grep -i nvidia *
grep: clusters: Is a directory
software.csv:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"...
software.csv:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep...
software.csv:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...

[intro_to_linux]$ grep -r -i nvidia *
clusters/teton.html:    .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td>
clusters/beartooth.html:    .../788758554/NVidia+HPC+SDK">NVidia HPC SDK</a></td>
software.csv:compiler,NVidia HPC SDK,nvhpc,"beartooth,teton"...
software.csv:libraries and toolkits,cuDNN,cudnn,beartooth,The NVIDIA CUDA Deep...
software.csv:libraries and toolkits,TensorRT,,beartooth,"NVIDIA TensorRT, an...

01.08 Exercises

# The software.csv file takes the form:
[intro_to_linux]$ head software.csv
Type,Name,Module,Cluster,Description
application,Alphafold,alphafold,"beartooth,teton",AlphaFold...
application,Astral,astral,wildiris,ASTRAL is a tool...
application,Augustus,augustus,beartooth,AUGUSTUS is a program...
application,Avizo,avizo,loren-pre202308,Avizo is a general-purpose...
application,ANGSD,angsd,"beartooth,teton",ANGSD: is a software...
application,ANSYS,ansys,teton,"ANSYS is a general-purpose software...

Questions:

  1. Which named applications are related to the words “bayes”?

  2. Which files contain reference to IPA?


01.09 Answers

[intro_to_linux]$ grep -i bayes software.csv
application,Bayescan,bayescan,beartooth,"BayeScan aims...
application,Beast1,beast1,wildiris,BEAST is a cross-platform program for Bayesian...
application,Beast2,beast2,beartooth,"BEAST 2 is a cross-platform program for Bayesian...
application,Freebayes,freebayes,beartooth,"freebayes is a Bayesian genetic...
application,Jags,jags,"beartooth,teton",Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical...
application,RevBayes,revbayes,wildiris,Bayesian phylogenetic...
application,ROHan,rohan,teton,"ROHan is a Bayesian framework...
application,SourceTracker2,sourcetracker2,"beartooth,teton","SourceTracker, a Bayesian approach...


[intro_to_linux]$ grep -r IPA *
clusters/beartooth.html:    .../pages/1893597185/IPA">IPA</a></td>
software.csv:application,IPA,ipa,beartooth,Improved Phased Assembler (IPA) is...

02 Search for a File


02.01a Searching for Files: find

Let us look at a folder with many subfolders and files.

[]$ cd ~/intro_to_linux
[intro_to_linux]$ ls
clusters  data  software.csv

[intro_to_linux]$ ls -R
.:
clusters  data  Intro_to_linux.pdf  software.csv  vegatables.txt  workshop_all.txt  workshop_me.txt

./clusters:
beartooth.html  loren.html  teton.html  wildiris.html

./data:
2021  2022  2023  dd.tx

./data/2021:
Apr  Nov  Sep

./data/2021/Apr:
20210403.txt  20210427.txt  20210428.txt

02.01b Searching for Files: find

./data/2021/Nov:
20211114.txt  20211115.txt  20211116.txt  hello.txt

./data/2021/Sep:
20210908.txt  20210921.txt

./data/2022:
Dec  Feb  Hello.csv  Jul  Jun

./data/2022/Dec:
20221207.txt  20221220.txt  20221230.txt  20221231.txt

./data/2022/Feb:
20220203.txt  20220223.txt

./data/2022/Jul:
20220720.txt  20220722.txt  20220723.TX
...

02.02 Searching for Files: find

Command

Description

find

Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]

default path is the current directory; default expression is -print
expression may consist of: operators, options, tests, and actions:
...
EXPRESSION
       The  part  of the command line after the list of starting points is the expression.  This is a
       kind of query specification describing how we match files and what we do
...
  TESTS
    ...
    -name pattern
      Base of file name (the path with the leading directories removed) matches shell pattern pattern.
    ...
    -iname pattern
       Like -name, but the match is case insensitive.
    ...

02.03 Examples

[]$ cd ~/intro_to_linux/
[intro_to_linux]$ find . -name 20230121.txt
./data/2023/Jan/20230121.txt

# Check that this file is within the returned location.
[intro_to_linux]$ ls data/2023/Jan/
20230102.txt  20230108.txt  20230115.txt  20230121.txt

# Nothing returned – no file exists called “20230120.txt”
[intro_to_linux]$ find . -name 20230120.txt
[arcc-t05@blog1 intro_to_linux]$

[intro_to_linux]$ find . -name README.txt
./data/2021/README.txt

# find is case–sensitive: use –iname option
[intro_to_linux]$ find . -iname README.txt
./data/2021/README.txt
./data/2022/readme.txt
./data/2023/ReadMe.txt

02.04 Examples

# Use wildcards to find all files with the postfix .csv:
[intro_to_linux]$ find . -name "*.csv"
./software.csv
./data/2022/Hello.csv

# Find any files/folders that contain the string “dec”
[arcc-t05@blog1 intro_to_linux]$ find . -name "*dec*"
./data/2022/Dec/2022_dec_01.txt

[arcc-t05@blog1 intro_to_linux]$ find . -iname "*dec*"
./data/2022/Dec
./data/2022/Dec/2022_dec_01.txt

# Find only folders.
[arcc-t05@blog1 intro_to_linux]$ find . -type d -iname "*dec*"
./data/2022/Dec

# Find only files
[arcc-t05@blog1 intro_to_linux]$ find . -type f -iname "*dec*"
./data/2022/Dec/2022_dec_01.txt

02.05 Exercises

Questions:

  1. What do we notice about some of the find command options?

  2. Find any files that contain the string “hello”, regardless of case, within their filename.

  3. Find any folders or files that contain the string “feb” regardless of case.

    1. Can you list only the folders?

  4. Find any files that have the postfix “tx” – must be lowercase.


02.06 Answers

1: What do we notice about some of the find command options?

  • That some of the single dash options (-name) are similar to long-names and not single letters.

2: Find any files that contain the string “hello”, regardless of case, within their filename.

[intro_to_linux]$ find . -name "hello“

[intro_to_linux]$ find . -name "hello.*"
./data/2021/Nov/hello.txt

[intro_to_linux]$ find . -iname "hello.*"
./data/2021/Nov/hello.txt
./data/2022/Hello.csv
./data/2023/Mar/HELLO.txt

02.07 Answers

3: Find any folders or files that contain the string “feb” regardless of case.

  • Can you list only the folders?

[intro_to_linux]$ find . -name feb
./data/2021/feb

[intro_to_linux]$ find . -iname feb
./data/2021/feb
./data/2023/Feb

[intro_to_linux]$ find . -iname "*feb*"
./data/2021/feb
./data/2021/feb/february_01_2021.tx
./data/2022/February
./data/2023/Feb

[intro_to_linux]$ find . -type d -iname "*feb*"
./data/2021/feb
./data/2022/February
./data/2023/Feb

02.08a Answers

4: Find any files that have the postfix “tx” – must be lowercase.

[intro_to_linux]$ find . -name "tx"

[intro_to_linux]$ find . -name "*tx*"
./data/2021/README.txt
./data/2021/Nov/20211115.txt
./data/2021/Nov/hello.txt
./data/2021/Nov/20211114.txt
…

[intro_to_linux]$ find . -name "*tx"
./data/dd.tx
./data/2021/feb/february_01_2021.tx
./data/2023/Jan/texttx

[intro_to_linux]$ find . -name "*.tx"
./data/dd.tx
./data/2021/feb/february_01_2021.tx

02.08a Answers

# dd.tx is actually a folder.
# Notices the ’d’ in the long format list.
[intro_to_linux]$ ls -l data
total 4
drwxrwxr-x 6 arcc-t05 arcc-t05 2021
drwxrwxr-x 6 arcc-t05 arcc-t05 2022
drwxrwxr-x 5 arcc-t05 arcc-t05 2023
drwxrwxr-x 2 arcc-t05 arcc-t05 dd.tx


[intro_to_linux]$ find . -type f -name "*.tx"
./data/2021/feb/february_01_2021.tx

# We explicitly want lowercase.
[intro_to_linux]$ find . -type f -iname "*.tx"
./data/2021/feb/february_01_2021.tx
./data/2022/20220723.TX

03 Output Redirection and Pipes


03 Output Redirection and Pipes

  • Redirection of output: > vs >>

    • redirect sends a channel of output to a file.

    • You can redirect a file as input to a command using < and << (not looked at).

  • Using pipe “|’

    • A pipe passes standard output as the standard input to another command

  • Examples of the form: 

    • View a text file and pipe to grep.

    • Cat a list and sort by line.

    • Sort and then find unique items.

    • View folder contents and look for a specifically named name.


03.01 Redirection of output: > vs >>

# Writes out to the command line.
[intro_to_linux]$ grep -i bayes software.csv

# Redirects the output to a file called apps.txt
[intro_to_linux]$ grep -i bayes software.csv > apps.txt

[intro_to_linux]$ ls 
apps.txt  clusters  data  software.csv
[intro_to_linux]$ cat apps.txt

# Overwrites any existing file called apps.txt
[intro_to_linux]$ grep -i IPA software.csv > apps.txt
[intro_to_linux]$ cat apps.txt

[intro_to_linux]$ rm apps.txt

# Overwrites existing apps.txt
[intro_to_linux]$ grep -i bayes software.csv > apps.txt
# Appends to the existing file.
[intro_to_linux]$ grep -i IPA software.csv >> apps.txt

03.02 Example: Using pipe “|” from a file.

[intro_to_linux]$ cat fruits.txt
Gooseberry
Apple
Apricot
Avocado
Strawberry
...

[intro_to_linux]$ cat fruits.txt | wc -l
97

03.03 Example continued:

# The order of items is the same as listed within the fruits.txt file.
[intro_to_linux]$ cat fruits.txt | grep berry 
Gooseberry
Strawberry
Bilberry
Blackberry
Marionberry
Blueberry
Boysenberry
Gooseberry
Cloudberry
Elderberry
Goji berry
Honeyberry
Juniper berry
Cranberry
Cranberry
Marionberry
Gooseberry
Mulberry
Salmonberry
Huckleberry
Raspberry
Salal berry

03.04 Example continued:

# Notice the duplicates.
[intro_to_linux]$ cat fruits.txt | grep berry | sort
Bilberry
Blackberry
Blueberry
Boysenberry
Cloudberry
Cranberry
Cranberry
Elderberry
Goji berry
Gooseberry
Gooseberry
Gooseberry
Honeyberry
Huckleberry
Juniper berry
Marionberry
Marionberry
Mulberry
Raspberry
Salal berry
Salmonberry
Strawberry

03.05 Example continued:

# Duplicates have been removed leaving only the unique names.
[intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq
Bilberry
Blackberry
Blueberry
Boysenberry
Cloudberry
Cranberry
Elderberry
Goji berry
Gooseberry
Honeyberry
Huckleberry
Juniper berry
Marionberry
Mulberry
Raspberry
Salal berry
Salmonberry
Strawberry

[intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq | wc –l
18

03.06 Example continued 

[intro_to_linux]$ cat fruits.txt | grep berry | sort | uniq > berries.txt
[intro_to_linux]$ cat berries.txt
Bilberry
Blackberry
Blueberry
Boysenberry
Cloudberry
Cranberry
Elderberry
Goji berry
Gooseberry
Honeyberry
Huckleberry
Juniper berry
Marionberry
Mulberry
Raspberry
Salal berry
Salmonberry
Strawberry

[intro_to_linux]$ cat berries.txt | wc -l
18

03.07 Example: Pipe from ls command

[intro_to_linux]$ ls -R

[intro_to_linux]$ ls -R | grep "Feb"
February
./data/2022/February:
Feb
./data/2023/Feb:

[intro_to_linux]$ ls -R | grep -i "Feb"
feb
./data/2021/feb:
february_01_2021.tx
February
./data/2022/February:
Feb
./data/2023/Feb:

03.08 Exercises

  1. How does the wc command work? What are its options?

  2. How does the sort command work? What are its options?

  3. How does the uniq command work? What are its options?

  4. How many unique varieties of beans are there in the vegetables.txt file?


03.09 Answers

4: How many unique varieties of beans ae there in the vegetables.txt file?

  • How do you deal with “soy beans” vs “Soy Beans”?

  • What options does the uniq command provide?

[intro_to_linux]$ cat vegatables.txt | grep -i beans | sort | uniq -i | wc -l
12

04 More Intermediate Features, Next Steps, Suggestions and Summary


04.01a More Intermediate Features

  • Environment Variables: Define the behavior of the environment: Try:

    • echo $HOME

    • echo $USER

    • echo $SHELL

    • echo $PATH

  • File searching/manipulation

    • sed: stream editor for filtering and transforming text

    • gawk: pattern scanning and processing language

  • Ability to update file permission and ownership: chmod/chown

    • User-case of sharing files/folders.


04.01b More Intermediate Features

  • Aliases in .bashrc.

    • Create short-cuts of popular/frequently used commands.

  • Text editors: vi/vim/nano

    • vimtutor

    • touch

  • Remote access with ssh.


04.02 Next Steps, Suggestions


04.03 Further Trainings: UWYO LinkedIn

  • Introduction to Linux

  • Learning Linux Command Line

  • Linux: Files and Permissions

  • Linux: Over and Installation

  • Learning Linux Shell Scripting


04.04 Request an Account with ARCC

image-20240522-200611.png

04.05 Summary

In this workshop we have:

  • How to search for a string within a file.

  • How to find a file.

  • How to redirect the output of a command into a file.

  • How to use pipes to direct the output of one command as the input into another command.

  • No labels