Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

/We will be providing a quick tour covering high-level ideas for using Linux and HPC on our cluster, which should all you to access and use our Beartooth Cluster to perform analysis associated with this workshop.

Goals:

  • Introduce ARCC and what types of services we provide including “what is HPC?”

  • Define “what is a cluster”, and how is it made of partitions and compute nodes.

  • How to access and start using ARCC’s Beartooth cluster - using our SouthPass service.

  • How to start an interactive desktop and open a terminal to use Linux commands within.

  • Introduce the basics of Linux, the command-line, and how its File System looks on Beartooth.

  • Introduce Linux commands to allow navigation and file/folder manipulation.

  • Introduce Linux commands to allow text files to be searched and manipulated.

  • Introduce using a command-line text-editor and an alternative GUI based application.

  • How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.

  • How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.

  • How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.

...

/We will be providing a quick tour covering high-level ideas for using Linux and HPC on our cluster, which should all you to access and use our Beartooth Cluster to perform analysis associated with this workshop.

Goals:

  • Introduce ARCC and what types of services we provide including “what is HPC?”

  • Define “what is a cluster”, and how is it made of partitions and compute nodes.

  • How to access and start using ARCC’s Beartooth cluster - using our SouthPass service.

  • How to start an interactive desktop and open a terminal to use Linux commands within.

  • Introduce the basics of Linux, the command-line, and how its File System looks on Beartooth.

  • Introduce Linux commands to allow navigation and file/folder manipulation.

  • Introduce Linux commands to allow text files to be searched and manipulated.

  • Introduce using a command-line text-editor and an alternative GUI based application.

  • How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.

  • How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.

  • How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.

...

Table of Contents
minLevel1
maxLevel1
outlinefalse
stylenone
typelist
printabletrue

...

Demonstrating how to get help in CLI

Demonstrating vi/vim text editor

VI/Vim is one of several text editors available for Linux Command Line. (vi filename or vim filename)

  • i - to start insert mode (allows you to enter text)

  • <esc> key - to exit out of insert mode

  • dd - when not in insert mode, to delete a whole line

  • :q - outside of insert mode to quit

  • :wq - outside of insert mode to write the contents to the file, and then quit

cat - reads file(s) sequentially, displaying content to the terminal

Code Block[arcc-t10@blog2 arcc-t10]$ vi testfile stuff and things ~ ~
  • man - Short for the manual page. This is an interface to view the reference manual for the application or command.

Code Block
arcc-t10@blog2 ~]$ man pwd
NAME
       pwd - print name of current/working directory
SYNOPSIS
       pwd [OPTION]...
DESCRIPTION
       Print the full filename of the current working directory.
       -L, --logical
              use PWD from environment, even if it contains symlinks
       -P, --physical
              avoid all symlinks
       --help display this help and exit
       --version
              output version information and exit
       If no option is specified, -P is assumed.
       NOTE:  your  shell  may have its own version of pwd, which usually supersedes the version described here.  Please refer to your shell's documentation
       for details about the options it supports.
  • --help - a built-in command in shell. It accepts a text string as the command line argument and searches the supplied string in the shell's documents.

Code Block
[arcc-t10@blog1 ~]$ cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
  or:  cp [OPTION]... SOURCE... DIRECTORY
  or:  cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

Demonstrating file navigation in CLI

...

File Navigation demonstrating the use of:

  • pwd (Print Working Directory)

  • ls (“List” lists information about directories and any type of files in the working directory)

  • ls flags

    • -l (tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)

    • -a (Lists all entries in the directory, including the entries that begin with a . which are hidden)

  • cd (Change Directory)

  • cd .. (Change Directory - up one level)

...

Code Block
arcc-t10@blog2 ~]$ pwd
/home/arcc-t10
arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R
arcc-t10@blog2 ~]$ cd /project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ cd arcc-t10
[arcc-t10@blog2 arcc-t10]$ ls -la
total 2.0K
drwxr-sr-x  2 arcc-t10 biocompworkshop 4.0K May 23 11:05 .
drwxrws--- 80 root     biocompworkshop 4.0K Jun  4 14:39 ..
[arcc-t10@blog2 arcc-t10]$ pwd
/project/biocompworkshop/arcc-t10
[arcc-t10@blog2 arcc-t10]$ cd ..
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop

Demonstrating how to create and remove files and folders using CLI

...

Creating, moving and copying files and folders:

  • touch (Used to create a file without content. The file created using the touch command is empty)

  • mkdir (Make Directory - to create an empty directory)

  • mv (Move - moves a file or directory from one location to another)

  • cd.. (Change Directory - up one level)

  • cp (Copy - copies a file or directory from one location to another)

    • -r flag (Recursive)

  • ~ (Alias for /home/user)

  • rm (Remove - removes a file or if used with -r, removes directory and recursively removes files in directory)

...

Code Block
[arcc-t10@blog2 arcc-t10]$ touch testfile
[arcc-t10@blog2 arcc-t10]$ mkdir testdirectory
[arcc-t10@blog2 arcc-t10]$ ls
testdirectory  testfile
[arcc-t10@blog2 arcc-t10]$ mv testfile testdirectory
[arcc-t10@blog2 arcc-t10]$ cd testdirectory
[arcc-t10@blog2 testdirectory]$ ls
testfile
[arcc-t10@blog2 testdirectory]$ cd.. 
[arcc-t10@blog2 arcc-t10]$ cp -r testdirectory ~
[arcc-t10@blog2 arcc-t10]$ cd ~
[arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R  testdirectory 
[arcc-t10@blog2 ~]$ cd testdirectory
[arcc-t10@blog2 ~]$ ls
testfile
[arcc-t10@blog2 ~]$ rm testfile
[arcc-t10@blog2 ~]$ ls

Text Editor Cheatsheets

...

Vi/Vim Cheatsheet

...

Nano Cheatsheet

...

https://phoenixnap.com/kb/vim-commands-cheat-sheet

...

https://geek-university.com/nano-text-editor/

 no option is specified, -P is assumed.
       NOTE:  your  shell  may have its own version of pwd, which usually supersedes the version described here.  Please refer to your shell's documentation
       for details about the options it supports.
  • --help - a built-in command in shell. It accepts a text string as the command line argument and searches the supplied string in the shell's documents.

Code Block
[arcc-t10@blog1 ~]$ cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
  or:  cp [OPTION]... SOURCE... DIRECTORY
  or:  cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

...

Demonstrating file navigation in CLI

File Navigation demonstrating the use of:

  • pwd (Print Working Directory)

  • ls (“List” lists information about directories and any type of files in the working directory)

  • ls flags

    • -l (tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)

    • -a (Lists all entries in the directory, including the entries that begin with a . which are hidden)

  • cd (Change Directory)

  • cd .. (Change Directory - up one level)

Code Block
arcc-t10@blog2 ~]$ pwd
/home/arcc-t10
arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R
arcc-t10@blog2 ~]$ cd /project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ cd arcc-t10
[arcc-t10@blog2 arcc-t10]$ ls -la
total 2.0K
drwxr-sr-x  2 arcc-t10 biocompworkshop 4.0K May 23 11:05 .
drwxrws--- 80 root     biocompworkshop 4.0K Jun  4 14:39 ..
[arcc-t10@blog2 arcc-t10]$ pwd
/project/biocompworkshop/arcc-t10
[arcc-t10@blog2 arcc-t10]$ cd ..
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop

...

Demonstrating how to create and remove files and folders using CLI

Creating, moving and copying files and folders:

  • touch (Used to create a file without content. The file created using the touch command is empty)

  • mkdir (Make Directory - to create an empty directory)

  • mv (Move - moves a file or directory from one location to another)

  • cd.. (Change Directory - up one level)

  • cp (Copy - copies a file or directory from one location to another)

    • -r flag (Recursive)

  • ~ (Alias for /home/user)

  • rm (Remove - removes a file or if used with -r, removes directory and recursively removes files in directory)

Code Block
[arcc-t10@blog2 arcc-t10]$ touch testfile
[arcc-t10@blog2 arcc-t10]$ mkdir testdirectory
[arcc-t10@blog2 arcc-t10]$ ls
testdirectory  testfile
[arcc-t10@blog2 arcc-t10]$ mv testfile testdirectory
[arcc-t10@blog2 arcc-t10]$ cd testdirectory
[arcc-t10@blog2 testdirectory]$ ls
testfile
[arcc-t10@blog2 testdirectory]$ cd.. 
[arcc-t10@blog2 arcc-t10]$ cp -r testdirectory ~
[arcc-t10@blog2 arcc-t10]$ cd ~
[arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R  testdirectory 
[arcc-t10@blog2 ~]$ cd testdirectory
[arcc-t10@blog2 ~]$ ls
testfile
[arcc-t10@blog2 ~]$ rm testfile
[arcc-t10@blog2 ~]$ ls

...

Text Editor Cheatsheets

...

Demonstrating vi/vim text editor

VI/Vim is one of several text editors available for Linux Command Line. (vi filename or vim filename)

  • i - to start insert mode (allows you to enter text)

  • <esc> key - to exit out of insert mode

  • dd - when not in insert mode, to delete a whole line

  • :q - outside of insert mode to quit

  • :wq - outside of insert mode to write the contents to the file, and then quit

cat - reads file(s) sequentially, displaying content to the terminal

Code Block
[arcc-t10@blog2 arcc-t10]$ vi testfile

stuff and things
~                                                                                                                                
~                                                                                                                                
~                                                                                                                                
~                                                                                                                                
:wq            

[arcc-t10@blog2 arcc-t10]$ cat testfile
stuff and things

...

Try the vim tutor

Vim Tutor is a walkthrough for new users to get used to Vim.

Run vimtutor in the command line to begin learning interactively.

Code Block
[arc-t10@blog2 ~]$ vimtutor
===============================================================================
=    W e l c o m e   t o   t h e   V I M   T u t o r    -    Version 1.7      =
=============================================================================== 
     Vim is a very powerful editor that has many commands, too many to 
     explain in a tutor such as this. This tutor is designed to describe 
     enough of the commands that you will be able to easily use Vim as 
     an all-purpose editor. 
     ...

04 Using Linux to Search/Parse Text Files

Goals:

  • Using the command-line, demonstrate how to search and parse text files.

  • Show how export can be used to setup environment variables and echo to see what values they store.

  • Linux Commands:

    • find

    • cat / head / tail / grep

    • sort / uniq

    • Pipe | output from one command to the input of another, and redirect to a file using >, >>.

Based on: Intro to Linux Command-Line: View Find and Search Files

...

Your Environment: Echo and Export

Code Block
# View the settings configured within your environment.
[]$ env
# View a particular environment variable
# PATH: Where you environment will look for execuatables/commands.
[]$ echo $PATH
# Create an environment variable that points to the workshop data folder.
[] export WS_DATA=/project/biocompworkshop/Data_Vault
# Check it has been correctly set.
[]$ echo $WS_DATA
/project/biocompworkshop/Data_Vault

...

Use Our Environment Variable

Code Block
# Lets use it.
# Navigate to your home.
[]$ cd
# Navigate to the workshop data folder.
[~]$ cd $WS_DATA
[]$ pwd
/project/biocompworkshop/Data_Vault
# These are only available within this particular terminal/session.
# Once you close this terminal, they are gone.
# They are not available across other terminals.
# Advanced: To make 'permanent' you can update your ~/.bashrc

...

Search for a File

Based on: Search for a File

Code Block
[]$ cd /project/biocompworkshop/salexan5/test_data
# Find a file using its full name.
[]$ find . -name "epithelial_overrep_gene_list.tsv"
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
# Remember, Linux is case sensitive
# Returned to command prompt with no output.
[]$ find . -name "Epithelial_overrep_gene_list.tsv"
[]$
# Use case-insensitive option:
[]$ find . -iname "Epithelial_overrep_gene_list.tsv"
./test_data/scRNASeq_Results/epithelial_overrep_gene_list.tsv

...

Use Wildcards *

Code Block
# Use Wildcards:
[]$ find . -name "epithelial*"
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
./scRNASeq_Results/epithelial_de_gsea.tsv
[]$ find . -name "*.tsv"
./Grch38/Hisat2/exons.tsv
./Grch38/Hisat2/splicesites.tsv
./DE_Results/DE_sig_genes_DESeq2.tsv
./DE_Results/DE_all_genes_DESeq2.tsv
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
./scRNASeq_Results/epithelial_de_gsea.tsv
./Pathway_Results/fc.go.cc.p.down.tsv
./Pathway_Results/fc.go.cc.p.up.tsv
./BatchCorrection_Results/DE_genes_uhr_vs_hbr_corrected.tsv

...

View the Contents of a File

Based on: View/Search a File

Code Block
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# View the contents of a TEXT based file:
# Prints everything.
[]$ cat epithelial_overrep_gene_list.tsv
# View 'page-by-page'
# Press 'q' to exit and return to the command-line prompt.
[]$ more epithelial_overrep_gene_list.tsv

...

View the Start and End of a File

Code Block
# View the first 10 items.
[]$ head epithelial_overrep_gene_list.tsv
# View the first 15 items.
[]$ head -n 15 epithelial_overrep_gene_list.tsv
# View the last 10 items.
[]$ tail epithelial_overrep_gene_list.tsv
# View the last 15 items.
[]$ tail -n 15 epithelial_overrep_gene_list.tsv
# On a login node, remember you can use 'man head' 
# or tail --help to look up all the options for a command.

...

Search the Contents of a Text File

Code Block
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# Find rows containing "Zfp1"
# Remember: Linux is case-sensitive
# Searching for all lower case: zfp1
[]$ grep zfp1 epithelial_overrep_gene_list.tsv
[]$ 
# Searching with correct upper/lower case combination: Zfp1
# Returns all the lines that contain this piece of text.
[]$ grep Zfp1 epithelial_overrep_gene_list.tsv
Zfp106
Zfp146
Zfp185
Zfp1

...

Grep-ing with Case-Insensitive and Line Numbers

Code Block
# Grep ignoring case.
[]$ grep -i zfp1 epithelial_overrep_gene_list.tsv
Zfp106
Zfp146
Zfp185
Zfp1
# What line numbers are the elements on?
[]$ grep -n -i zfp1 epithelial_overrep_gene_list.tsv
696:Zfp106
1998:Zfp146
2041:Zfp185
2113:Zfp1

...

Pipe: Count, Sort

Based on: Output Redirection and Pipes

Code Block
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# Pipe: direct the output of one command to the input of another.
# Count how many lines/rows are in a file.
[]$ cat epithelial_overrep_gene_list.tsv | wc -l
2254
# Alphabetically soft a file:
[] sort epithelial_overrep_gene_list.tsv
...
Zswim4
Zyx
Zzz3
Zzz3
# Count lines after sorting.
[]$ sort epithelial_overrep_gene_list.tsv | wc -l
2254

...

Uniq

Code Block
# Find and list the unique elements within a file.
# You need to sort your elements first.
[] sort epithelial_overrep_gene_list.tsv | uniq
...
Zswim4
Zyx
Zzz3
# You can pipe multiple commands together.
# Find, list and count the unique elements within a file:
[] sort epithelial_overrep_gene_list.tsv | uniq  | wc -l
2253

...

Redirect Output into a File

Code Block
# Redirect an output into a file.
# > : Over writes a file : >> : Appends to a file.
[] sort epithelial_overrep_gene_list.tsv > sorted.tsv
# This will fail for anyone else.
-bash: sorted.tsv: Permission denied
# You do not have write permission within this folder.
[]$ cd ..
[]$ ls -al
drwxr-sr-x  2 salexan5 biocompworkshop 4096 May 31 13:50 scRNASeq_Results
# Redirect to a location where you do have write permission - you home folder.
[]$ cd scRNASeq_Results/
[]$ sort epithelial_overrep_gene_list.tsv > ~/sorted.tsv
[]$ ls ~
... sorted.tsv ...
[]$ head ~/sorted.tsv

...

05 Lets start using R(/Python) and RStudio

Goals:

  • Using a terminal (via an Interactive Desktop), demonstrate how to load modules to setup an environment that uses R/RStudio and how to start the GUI.

  • Mention how the module system will be used, in later workshops, to load other software applications.

  • (Indicate how this relates to setting up environment variables behind the scenes.)

  • Further explain the differences between using a login node that requires an salloc to access a compute node, and that you're already running on a compute node (with limited resources) via an interactive desktop.

    • Confirm arguments for partition, gres/gpu, reservation.

    • Note that can confirm a GPU device is available by running nvidia-smi -L from the command-line.

  • Show how the resources from the Interactive Desktop configuration start mapping to those used by salloc (including defining reservations, and maybe partitions).

Based on Intro to Accessing the Cluster and the Module System

...

Open a Terminal

You can access a Linux terminal from SouthPass by:

  • Opening up an Interactive Desktop (reservation is biocompworkshop) and opening a terminal.

    • Running on a compute node: Command prompt: [<username>@t402 ~]$

    • The reservation is only available for this workshop: StartTime=06.09-09:00:00 EndTime=06.17-17:00:00 Duration=8-08:00:0

    • Only select what you require:

      • How many hours? Your session will NOT run any longer that the amount of hours you requested.

      • Some Desktop Configurations will NOT work with some GPU Types.

      • Do you actually need a GPU?

        • Unless you software/library/package has been developed to utilize a GPU, simply selected one will NOT make any difference - this won’t make you code magically run faster.

  • Selecting a Beartooth Shell Access which opens up a new browser tab.

    • Running on the login node: [<username>@blog1/2 ~]$

To run any GUI application, you must use SouthPass and an Interactive Desktop.

...

Setting Up a Session Environment

Across the week, you’ll be using a number of different environments.

  • Running specific software applications.

  • Programming with R and using various R libraries.

  • Programming with Python and using various Python packages.

  • Environments build with Miniconda - a package/environment manager.

Since the cluster has to cater for everyone we can not provide a simple desktop environment that provides everything.

Instead we provide modules that a user will load that configures their environment for their particular needs within a session.

Loading a module configures various environment variables within that Session.

...

What is Available?

We have environments available based on compilers, Singularity containers, Conda, Linux Binaries

Code Block
[]$ module avail
[]$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
[]$ which gcc
/usr/bin/gcc
[]$ echo $PATH
/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:/apps/s/arcc/1.0/bin:/apps/s/slurm/latest/bin:
/apps/s/turbovnc/turbovnc-2.2.6/bin:/home/salexan5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:
/usr/sbin:/home/salexan5/.local/bin:/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin
[]$ module spider rstudio
----------------------------------------------------------------------------
  rstudio: rstudio/2023.9.0

...

Is Python and/or R available?

Code Block
# An old version of Python is available on the System.
# Systems are updated! Do NOT rely on them for you environment regards versions/reproducability.
[]$ which python
/usr/bin/python
[]$ python --version
Python 3.8.17
# R is NOT available.
[]$ which R
/usr/bin/which: no R in (/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:
/apps/s/arcc/1.0/bin:/apps/s/slurm/latest/bin:/apps/s/turbovnc/turbovnc-2.2.6/bin:
/home/salexan5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/salexan5/.local/bin:
/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin)
# Nothing returned.
[]$ echo $R_HOME
[]$ 

...

Load a Compiler

Code Block
# What's avail for a compiler?
[]$ module load gcc/12.2.0
[]$ module avail
# Notice there are a lot more applications available under this loaded compiler.
[]$ gcc --version
gcc (Spack GCC) 12.2.0
[]$ which gcc
/apps/u/spack/gcc/8.5.0/gcc/12.2.0-orvuxnl/bin/gcc
# Notice that the environment variables have been extended.
[]$ echo $PATH
/apps/u/spack/gcc/8.5.0/gcc/12.2.0-orvuxnl/bin:/apps/u/spack/gcc/12.2.0/zstd/1.5.2-5gdwnny/bin:
/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:/apps/s/arcc/1.0/bin:
/apps/s/slurm/latest/bin:/apps/s/turbovnc/turbovnc-2.2.6/bin:/home/salexan5/bin:
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/salexan5/.local/bin:
/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin
# Notice R is now available and newer versions of Python are available under gcc/12.2.0

...

Load a Newer Version of Python

Code Block
[]$ module load python/3.10.6
[]$ which python
/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/bin/python
[]$ python --version
Python 3.10.6

...

Typically Loading R

Code Block
[]$ module load r/4.4.0
# Notice the environment variable has now been set.
[]$ echo $R_HOME
/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/rlib/R
[]$ which R
/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/bin/R
[]$ R --version
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Note

You then perform: install.packages and manage these yourself.

Note

Same with Python: You perform the pip install to install which ever Python packages you require.

...

Load R/RStudio for this Workshop

You can use module purge to reset your environment, or start a new terminal

Code Block
[]$ module use /project/biocompworkshop/software/modules/
[]$ module avail
-------------------------------------------------- /project/biocompworkshop/software/modules --------------------------------------------------
   bam-readcount/0.8.0    fastp/0.23.4    r/4.4.0-biocomp    regtools/1.0.0    rseqc_hawsh/1.0.0    subread/2.0.6    tophat/2.1.1
[]$ module load r/4.4.0-biocomp
[]$ module load rstudio/2023.9.0
[]$ rstudio

...

Configure your R Environment for this Workshop

Code Block
# Within the R Terminal:
> library(Suerat)
Error in library(Suerat) : there is no package called 'Suerat'
> .libPaths(c('/project/biocompworkshop/software/r/libraries/4.4.0', '/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/rlib/R/library'))
# Notice how the list of System Library packages listed in RStudio has changed.
> library(Seurat)
Loading required package: SeuratObject
Loading required package: sp
Attaching package: 'SeuratObject'
The following objects are masked from 'package:base':
    intersect, t
Note

To use the pre-installed libraries within an R script you will need to add the .libPaths(...) line to the start of your scripts.

...

Request Interactive Session (Compute Node) from a Login Node

image-20240603-201613.pngImage Added
Code Block
# Short form:
# Notice we can request more memory.
[@blog1 ~]$ salloc -A biocompworkshop -t 4:00:00 --mem=4G -c 1 --reservation=biocompworkshop 
# Long form
# MUST define account/A and time/t
[@blog1 ~]$ salloc --account=biocompworkshop --time=4:00:00 --mem=4G --cpus-per-task=1 --reservation=biocompworkshop
salloc: Granted job allocation 16053847
salloc: Nodes t402 are ready for job
# Notice how the node name has changed in the command prompt.
[@t402 ~]$
[@t402 ~]$ exit
exit
salloc: Relinquishing job allocation 16053847
# Returns to the login node. Lots more options.
[@blog1 ~]$ salloc --help

...

Request Interactive Session (Compute Node) with a GPU

image-20240603-201716.pngImage Added
Code Block
[@blog1 ~]$ salloc -A biocompworkshop -t 8:00:00 --mem=8G -c 2 -p dgx --gres=gpu:1 --reservation=biocompworkshop
salloc: Granted job allocation 16053855
salloc: Nodes mdgx01 are ready for job
[@mdgx01 ~]$ / [@tdgx01 ~]$
# Check you have a GPU allocated:
[@mdgx01 ~]$ nvidia-smi -L
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-454b58ba-2ea3-c2d3-ca97-bfac4265c0b1)
# You get what you ask for!
# No GPU requested.
[@blog1 ~]$ salloc -A biocompworkshop -t 8:00:00 --mem=8G -c 2 --partition=dgx
salloc: Granted job allocation 16053857
salloc: Nodes mdgx01 are ready for job
[@mdgx01 ~]$ nvidia-smi -L
No devices found.

...

Request what you Need!

Code Block
# You're telling this command to use 4 threads - 4 cores
[1]$ hisat2-build -p 4 ...
[@blog1]$ salloc --account=biocompworkshop --time=30:00 --reservation=biocompworkshop 
# Setup the Environment
[@t402]$ hisat2-build -p 4 --ss $INDEX/splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
...
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to read SNPs and splice sites: 00:00:00
Killed
[@blog1]$ salloc --account=biocompworkshop --time=30:00 --reservation=biocompworkshop -c 4
# Setup the Environment
[@t402]$ hisat2-build -p 4 --ss $INDEX/splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
...
Total time for call to driver() for forward index: 00:01:27

...

06 Create a basic workflow and submitting jobs.

  • Since RStudio is a GUI, demonstrate moving from running a script within RStudio to running using Rscript from the command-line.

  • Put the various elements of loading modules, moving into a folder, running an R file, that make up a basic workflow, into a script that can be submitted using sbatch to Slurm.

  • Map the salloc arguments to #SBATCH.

  • Show how to monitor a jobs using squeue as well as using the email related Slurm options.

  • Show how to request the DGX nodes and defining gres to specifically request a GPU.

  • Provide a basic template.

Based on:

...

Why Submit a Job

A single computation can take, minutes, hours, days, weeks, months. An interactive session quickly becomes impractical.

Submit a job to the Slurm queue - Slurm manages everything for you.

Everything you do on the command-line, working out your workflow, is put into a script.

Workflow:

  • What resources you require? (Interactive desktop configuration, salloc options)

  • What modules are loaded.

  • Which folder you’re running you computation within. Where the data is stored. Where you want the results.

  • Command-line calls being called.

  • Software applications being run.

...

Submit a Job to the Cluster

Convert salloc command-line options to an sbatch related script.

Options have defaults if not defined.

Code Block
# salloc
[@blog1 ~]$ salloc -A biocompworkshop -t 8:00:00 --mem=8G -c 2 -p dgx --gres=gpu:1 --reservation=biocompworkshop
# sbatch
# Options within your bash script.
#SBATCH --account=biocompworkshop       # Account. MUST be defined.
#SBATCH --time=8:00:00                  # Time.    MUST be defined.    
#SBATCH --mem=8G                        # Memory.
##SBATCH --mem-per-cpu=1G               # Commented out. Default is 1G if no memory values defined.
#SBATCH --cpus-per-task=2               # CPUs per Task - default is 1 if not defined.
#SBATCH --partition=dgx                 # Partition - If not defined, Slurm will select.
#SBATCH --gres=gpu:1                    # Generic Resources
#SBATCH --reservation=biocompworkshop   # Reservation

...

Additional sbatch Options

Code Block
#SBATCH --job-name=<job-name>
#SBATCH --nodes=<#nodes>                # Default is 1 if not defined.                
#SBATCH --ntasks-per-node=<#tasks/node> # Default is 1 if not defined.
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=<filename>_%A.out      # Postfix the job id to <filename>
                                        # If not defined: slurm-<job-id>.out

...

Example Script: What Goes into It?

The bash script can contain:

  • Linux/bash commands and script.

  • Module loads.

  • Application command-line calls.

 

Lets consider our R workflow. I have:

  • R scripts copied into my /gscratch folder.

  • R related modules to load.

  • R scripts to run.

  • to track the time the job starts and ends.

...

Example Script: Running R Script

Code Block
#!/bin/bash
# Comment: The first line 'shebang' is followed by the interpreter or the command that should be used to execute the script.
#SBATCH --job-name=r_job
#SBATCH --account=biocompworkshop
#SBATCH --time=10:00
#SBATCH --reservation=biocompworkshop
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=r_%A.out
export R_FILES=/gscratch/$USER
echo "R Workflow Example"
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module use /project/biocompworkshop/software/modules
module load r/4.4.0-biocomp
cd $R_FILES
Rscript test_r_libraries.R
END=$(date +'%D %T')
echo "End:" $END

...

Submit your Job

Code Block
# From your Working Directory - the folder you are currently in.
[@blog2]$ ls
run_r.sh  test_data
# You can submit the job from the login node.
# Make a note of the job id.
[@blog2]$ sbatch run_r.sh
Submitted batch job 16054193
# ST Column: Status of P means Pending / R means Running.
[@blog2]$ squeue -u salexan5
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          16054193     teton    r_job salexan5  R       0:06      1 t402
# Once the job is running, the defined output file will be 

...

generated.
[@blog2]$ ls
r_16054193.out  run_r.sh  test_data

...

Monitor your Job

Code Block
# You can view the contents of your output file:
[@blog2]$ cat r_16054193.out
R Workflow Example
Start: 06/05/24 14:02:01
SLURM_JOB_ID: 16054193
SLURM_JOB_NAME: r_job
SLURM_JOB_NODELIST: m221
Sleeping...
[@blog1]$ squeue -u salexan5
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          16054193     teton    r_job salexan5  R       0:18      1 t402
# If the job id is nolonger in the queue then it means the job is no longer running.
# It might have completed, or failed and exited.
[@blog1]$ squeue -u salexan5
             JOBID 

...

PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

...

Monitor your Job: Continued…(1)

Code Block
# You can monitor the queue 

...

and/or log file to check if running.
[salexan5@blog2 salexan5]$ cat r_16054193.out
R Workflow Example
Start: 06/05/24 14:02:01
SLURM_JOB_ID: 16054193
SLURM_JOB_NAME: r_job
SLURM_JOB_NODELIST: t402
Sleeping...
Loading required package: SeuratObject
Loading required package: sp
Attaching package: ‘SeuratObject’
The following objects are masked from ‘package:base’:
    intersect, t
End: 06/05/24 14:02:29
# OR...

...

Alternative Monitoring of Job via Email: Job Efficiency

Code Block
# Monitor your email:
Email 1:
Subject: beartooth Slurm Job_id=16054193 Name=r_job Began, Queued time 00:00:01
Email 2: Job Efficieny:
Subject: beartooth Slurm Job_id=16054193 Name=r_job Ended, Run time 00:00:28, COMPLETED, ExitCode 0
Job ID: 16054193
Cluster: 

...

Try the vim tutor

...

Vim Tutor is a walkthrough for new users to get used to Vim.

Run vimtutor in the command line to begin learning interactively.

...

beartooth
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:07
CPU Efficiency: 25.00% of 00:00:28 core-walltime
Job Wall-clock time: 00:00:28
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 1000.00 MB (1000.00 MB/core)

...

Example Script 2

This might look like something your cover in later sessions:

Code Block
#!/bin/bash
#SBATCH --job-name=hisat2
#SBATCH --account=biocompworkshop       
#SBATCH --time=8:00:00                  
#SBATCH --cpus-per-task=4 
#SBATCH --mem=8G                        
#SBATCH --reservation=biocompworkshop   
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=hisat2_%A.out
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module load gcc/12.2.0 hisat2/2.2.1
export REFERENCE=/project/biocompworkshop/rshukla/Grch38/fasta
export INDEX=/project/biocompworkshop/rshukla/Grch38/Hisat2
# Comment: Location of the splicesites.tsv file.
cd /gscratch/$USER
hisat2-build -p 4 --ss splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
END=$(date +'%D %T')
echo "End:" $END

...

Examples and Cheat Sheets

Can be copied from: /project/biocompworkshop/arcc_notes

...

 07 Summary and Next Steps

Run over the goals we’ve looked at.

Point towards the previous workshops for additional details.