/We will be providing a quick tour covering high-level ideas for using Linux and HPC on our cluster, which should all you to access and use our Beartooth Cluster to perform analysis associated with this workshop.
Goals:
Introduce ARCC and what types of services we provide including “what is HPC?”
Define “what is a cluster”, and how is it made of partitions and compute nodes.
How to access and start using ARCC’s Beartooth cluster - using our SouthPass service.
How to start an interactive desktop and open a terminal to use Linux commands within.
Introduce the basics of Linux, the command-line, and how its File System looks on Beartooth.
Introduce Linux commands to allow navigation and file/folder manipulation.
Introduce Linux commands to allow text files to be searched and manipulated.
Introduce using a command-line text-editor and an alternative GUI based application.
How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.
How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.
How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.
...
/We will be providing a quick tour covering high-level ideas for using Linux and HPC on our cluster, which should all you to access and use our Beartooth Cluster to perform analysis associated with this workshop.
Goals:
Introduce ARCC and what types of services we provide including “what is HPC?”
Define “what is a cluster”, and how is it made of partitions and compute nodes.
How to access and start using ARCC’s Beartooth cluster - using our SouthPass service.
How to start an interactive desktop and open a terminal to use Linux commands within.
Introduce the basics of Linux, the command-line, and how its File System looks on Beartooth.
Introduce Linux commands to allow navigation and file/folder manipulation.
Introduce Linux commands to allow text files to be searched and manipulated.
Introduce using a command-line text-editor and an alternative GUI based application.
How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.
How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.
How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.
...
Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
Demonstrating how to get help in CLI
Demonstrating vi/vim text editor
VI/Vim is one of several text editors available for Linux Command Line. (vi filename
or vim filename
)
i
- to start insert mode (allows you to enter text)<esc>
key - to exit out of insert modedd
- when not in insert mode, to delete a whole line:q
- outside of insert mode to quit:wq
- outside of insert mode to write the contents to the file, and then quit
cat
- reads file(s) sequentially, displaying content to the terminal
|
| ||
|
|
Demonstrating file navigation in CLI
...
File Navigation demonstrating the use of:
pwd
(Print Working Directory)ls
(“List” lists information about directories and any type of files in the working directory)ls
flags-l
(tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)-a
(Lists all entries in the directory, including the entries that begin with a . which are hidden)
cd
(Change Directory)cd ..
(Change Directory - up one level)
...
Code Block |
---|
arcc-t10@blog2 ~]$ pwd
/home/arcc-t10
arcc-t10@blog2 ~]$ ls
Desktop Documents Downloads ondemand R
arcc-t10@blog2 ~]$ cd /project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ cd arcc-t10
[arcc-t10@blog2 arcc-t10]$ ls -la
total 2.0K
drwxr-sr-x 2 arcc-t10 biocompworkshop 4.0K May 23 11:05 .
drwxrws--- 80 root biocompworkshop 4.0K Jun 4 14:39 ..
[arcc-t10@blog2 arcc-t10]$ pwd
/project/biocompworkshop/arcc-t10
[arcc-t10@blog2 arcc-t10]$ cd ..
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop |
Demonstrating how to create and remove files and folders using CLI
...
Creating, moving and copying files and folders:
touch
(Used to create a file without content. The file created using the touch command is empty)mkdir
(Make Directory - to create an empty directory)mv
(Move - moves a file or directory from one location to another)cd..
(Change Directory - up one level)cp
(Copy - copies a file or directory from one location to another)-r
flag (Recursive)
~
(Alias for/home/user
)rm
(Remove - removes a file or if used with-r
, removes directory and recursively removes files in directory)
...
Code Block |
---|
[arcc-t10@blog2 arcc-t10]$ touch testfile
[arcc-t10@blog2 arcc-t10]$ mkdir testdirectory
[arcc-t10@blog2 arcc-t10]$ ls
testdirectory testfile
[arcc-t10@blog2 arcc-t10]$ mv testfile testdirectory
[arcc-t10@blog2 arcc-t10]$ cd testdirectory
[arcc-t10@blog2 testdirectory]$ ls
testfile
[arcc-t10@blog2 testdirectory]$ cd..
[arcc-t10@blog2 arcc-t10]$ cp -r testdirectory ~
[arcc-t10@blog2 arcc-t10]$ cd ~
[arcc-t10@blog2 ~]$ ls
Desktop Documents Downloads ondemand R testdirectory
[arcc-t10@blog2 ~]$ cd testdirectory
[arcc-t10@blog2 ~]$ ls
testfile
[arcc-t10@blog2 ~]$ rm testfile
[arcc-t10@blog2 ~]$ ls
|
Text Editor Cheatsheets
...
Vi/Vim Cheatsheet
...
Nano Cheatsheet
...
https://phoenixnap.com/kb/vim-commands-cheat-sheet
...
https://geek-university.com/nano-text-editor/
| |||
|
|
...
Demonstrating file navigation in CLI
File Navigation demonstrating the use of:
|
|
...
Demonstrating how to create and remove files and folders using CLI
Creating, moving and copying files and folders:
|
|
...
Text Editor Cheatsheets
Vi/Vim Cheatsheet | Nano Cheatsheet |
---|---|
...
Demonstrating vi/vim text editor
VI/Vim is one of several text editors available for Linux Command Line. (
|
|
...
Try the vim tutor
Vim Tutor is a walkthrough for new users to get used to Vim. Run |
|
04 Using Linux to Search/Parse Text Files
Goals:
Using the command-line, demonstrate how to search and parse text files.
Show how
export
can be used to setup environment variables andecho
to see what values they store.Linux Commands:
find
cat
/head
/tail
/grep
sort
/uniq
Pipe
|
output from one command to the input of another, and redirect to a file using>
,>>
.
Based on: Intro to Linux Command-Line: View Find and Search Files
...
Your Environment: Echo and Export
Code Block |
---|
# View the settings configured within your environment.
[]$ env
# View a particular environment variable
# PATH: Where you environment will look for execuatables/commands.
[]$ echo $PATH
# Create an environment variable that points to the workshop data folder.
[] export WS_DATA=/project/biocompworkshop/Data_Vault
# Check it has been correctly set.
[]$ echo $WS_DATA
/project/biocompworkshop/Data_Vault |
...
Use Our Environment Variable
Code Block |
---|
# Lets use it.
# Navigate to your home.
[]$ cd
# Navigate to the workshop data folder.
[~]$ cd $WS_DATA
[]$ pwd
/project/biocompworkshop/Data_Vault
# These are only available within this particular terminal/session.
# Once you close this terminal, they are gone.
# They are not available across other terminals.
# Advanced: To make 'permanent' you can update your ~/.bashrc |
...
Search for a File
Based on: Search for a File
Code Block |
---|
[]$ cd /project/biocompworkshop/salexan5/test_data
# Find a file using its full name.
[]$ find . -name "epithelial_overrep_gene_list.tsv"
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
# Remember, Linux is case sensitive
# Returned to command prompt with no output.
[]$ find . -name "Epithelial_overrep_gene_list.tsv"
[]$
# Use case-insensitive option:
[]$ find . -iname "Epithelial_overrep_gene_list.tsv"
./test_data/scRNASeq_Results/epithelial_overrep_gene_list.tsv |
...
Use Wildcards *
Code Block |
---|
# Use Wildcards:
[]$ find . -name "epithelial*"
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
./scRNASeq_Results/epithelial_de_gsea.tsv
[]$ find . -name "*.tsv"
./Grch38/Hisat2/exons.tsv
./Grch38/Hisat2/splicesites.tsv
./DE_Results/DE_sig_genes_DESeq2.tsv
./DE_Results/DE_all_genes_DESeq2.tsv
./scRNASeq_Results/epithelial_overrep_gene_list.tsv
./scRNASeq_Results/epithelial_de_gsea.tsv
./Pathway_Results/fc.go.cc.p.down.tsv
./Pathway_Results/fc.go.cc.p.up.tsv
./BatchCorrection_Results/DE_genes_uhr_vs_hbr_corrected.tsv |
...
View the Contents of a File
Based on: View/Search a File
Code Block |
---|
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# View the contents of a TEXT based file:
# Prints everything.
[]$ cat epithelial_overrep_gene_list.tsv
# View 'page-by-page'
# Press 'q' to exit and return to the command-line prompt.
[]$ more epithelial_overrep_gene_list.tsv |
...
View the Start and End of a File
Code Block |
---|
# View the first 10 items.
[]$ head epithelial_overrep_gene_list.tsv
# View the first 15 items.
[]$ head -n 15 epithelial_overrep_gene_list.tsv
# View the last 10 items.
[]$ tail epithelial_overrep_gene_list.tsv
# View the last 15 items.
[]$ tail -n 15 epithelial_overrep_gene_list.tsv
# On a login node, remember you can use 'man head'
# or tail --help to look up all the options for a command. |
...
Search the Contents of a Text File
Code Block |
---|
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# Find rows containing "Zfp1"
# Remember: Linux is case-sensitive
# Searching for all lower case: zfp1
[]$ grep zfp1 epithelial_overrep_gene_list.tsv
[]$
# Searching with correct upper/lower case combination: Zfp1
# Returns all the lines that contain this piece of text.
[]$ grep Zfp1 epithelial_overrep_gene_list.tsv
Zfp106
Zfp146
Zfp185
Zfp1 |
...
Grep-ing with Case-Insensitive and Line Numbers
Code Block |
---|
# Grep ignoring case.
[]$ grep -i zfp1 epithelial_overrep_gene_list.tsv
Zfp106
Zfp146
Zfp185
Zfp1
# What line numbers are the elements on?
[]$ grep -n -i zfp1 epithelial_overrep_gene_list.tsv
696:Zfp106
1998:Zfp146
2041:Zfp185
2113:Zfp1 |
...
Pipe: Count, Sort
Based on: Output Redirection and Pipes
Code Block |
---|
[]$ cd /project/biocompworkshop/salexan5/test_data/scRNASeq_Results
# Pipe: direct the output of one command to the input of another.
# Count how many lines/rows are in a file.
[]$ cat epithelial_overrep_gene_list.tsv | wc -l
2254
# Alphabetically soft a file:
[] sort epithelial_overrep_gene_list.tsv
...
Zswim4
Zyx
Zzz3
Zzz3
# Count lines after sorting.
[]$ sort epithelial_overrep_gene_list.tsv | wc -l
2254 |
...
Uniq
Code Block |
---|
# Find and list the unique elements within a file.
# You need to sort your elements first.
[] sort epithelial_overrep_gene_list.tsv | uniq
...
Zswim4
Zyx
Zzz3
# You can pipe multiple commands together.
# Find, list and count the unique elements within a file:
[] sort epithelial_overrep_gene_list.tsv | uniq | wc -l
2253 |
...
Redirect Output into a File
Code Block |
---|
# Redirect an output into a file.
# > : Over writes a file : >> : Appends to a file.
[] sort epithelial_overrep_gene_list.tsv > sorted.tsv
# This will fail for anyone else.
-bash: sorted.tsv: Permission denied
# You do not have write permission within this folder.
[]$ cd ..
[]$ ls -al
drwxr-sr-x 2 salexan5 biocompworkshop 4096 May 31 13:50 scRNASeq_Results
# Redirect to a location where you do have write permission - you home folder.
[]$ cd scRNASeq_Results/
[]$ sort epithelial_overrep_gene_list.tsv > ~/sorted.tsv
[]$ ls ~
... sorted.tsv ...
[]$ head ~/sorted.tsv |
...
05 Lets start using R(/Python) and RStudio
Goals:
Using a terminal (via an Interactive Desktop), demonstrate how to load modules to setup an environment that uses R/RStudio and how to start the GUI.
Mention how the module system will be used, in later workshops, to load other software applications.
(Indicate how this relates to setting up environment variables behind the scenes.)
Further explain the differences between using a login node that requires an
salloc
to access a compute node, and that you're already running on a compute node (with limited resources) via an interactive desktop.Confirm arguments for
partition
,gres/gpu
,reservation
.Note that can confirm a GPU device is available by running
nvidia-smi -L
from the command-line.
Show how the resources from the Interactive Desktop configuration start mapping to those used by
salloc
(including defining reservations, and maybe partitions).
Based on Intro to Accessing the Cluster and the Module System
...
Open a Terminal
You can access a Linux terminal from SouthPass by:
Opening up an Interactive Desktop (reservation is
biocompworkshop
) and opening a terminal.Running on a compute node: Command prompt:
[<username>@t402 ~]$
The reservation is only available for this workshop:
StartTime=06.09-09:00:00 EndTime=06.17-17:00:00 Duration=8-08:00:0
Only select what you require:
How many hours? Your session will NOT run any longer that the amount of hours you requested.
Some Desktop Configurations will NOT work with some GPU Types.
Do you actually need a GPU?
Unless you software/library/package has been developed to utilize a GPU, simply selected one will NOT make any difference - this won’t make you code magically run faster.
Selecting a Beartooth Shell Access which opens up a new browser tab.
Running on the login node:
[<username>@blog1/2 ~]$
To run any GUI application, you must use SouthPass and an Interactive Desktop.
...
Setting Up a Session Environment
Across the week, you’ll be using a number of different environments.
Running specific software applications.
Programming with R and using various R libraries.
Programming with Python and using various Python packages.
Environments build with Miniconda - a package/environment manager.
Since the cluster has to cater for everyone we can not provide a simple desktop environment that provides everything.
Instead we provide modules that a user will load that configures their environment for their particular needs within a session.
Loading a module configures various environment variables within that Session.
...
What is Available?
We have environments available based on compilers, Singularity containers, Conda, Linux Binaries
Code Block |
---|
[]$ module avail
[]$ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
[]$ which gcc
/usr/bin/gcc
[]$ echo $PATH
/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:/apps/s/arcc/1.0/bin:/apps/s/slurm/latest/bin:
/apps/s/turbovnc/turbovnc-2.2.6/bin:/home/salexan5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:
/usr/sbin:/home/salexan5/.local/bin:/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin
[]$ module spider rstudio
----------------------------------------------------------------------------
rstudio: rstudio/2023.9.0 |
...
Is Python and/or R available?
Code Block |
---|
# An old version of Python is available on the System.
# Systems are updated! Do NOT rely on them for you environment regards versions/reproducability.
[]$ which python
/usr/bin/python
[]$ python --version
Python 3.8.17
# R is NOT available.
[]$ which R
/usr/bin/which: no R in (/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:
/apps/s/arcc/1.0/bin:/apps/s/slurm/latest/bin:/apps/s/turbovnc/turbovnc-2.2.6/bin:
/home/salexan5/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/salexan5/.local/bin:
/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin)
# Nothing returned.
[]$ echo $R_HOME
[]$ |
...
Load a Compiler
Code Block |
---|
# What's avail for a compiler?
[]$ module load gcc/12.2.0
[]$ module avail
# Notice there are a lot more applications available under this loaded compiler.
[]$ gcc --version
gcc (Spack GCC) 12.2.0
[]$ which gcc
/apps/u/spack/gcc/8.5.0/gcc/12.2.0-orvuxnl/bin/gcc
# Notice that the environment variables have been extended.
[]$ echo $PATH
/apps/u/spack/gcc/8.5.0/gcc/12.2.0-orvuxnl/bin:/apps/u/spack/gcc/12.2.0/zstd/1.5.2-5gdwnny/bin:
/home/salexan5/bin:/apps/s/projects/core_hour_usage/bin:/apps/s/arcc/1.0/bin:
/apps/s/slurm/latest/bin:/apps/s/turbovnc/turbovnc-2.2.6/bin:/home/salexan5/bin:
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/salexan5/.local/bin:
/home/salexan5/bin:/home/salexan5/.local/bin:/home/salexan5/bin
# Notice R is now available and newer versions of Python are available under gcc/12.2.0 |
...
Load a Newer Version of Python
Code Block |
---|
[]$ module load python/3.10.6
[]$ which python
/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/bin/python
[]$ python --version
Python 3.10.6 |
...
Typically Loading R
Code Block |
---|
[]$ module load r/4.4.0
# Notice the environment variable has now been set.
[]$ echo $R_HOME
/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/rlib/R
[]$ which R
/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/bin/R
[]$ R --version
R version 4.4.0 (2024-04-24) -- "Puppy Cup" |
Note |
---|
You then perform: |
Note |
---|
Same with Python: You perform the |
...
Load R/RStudio for this Workshop
You can use module purge
to reset your environment, or start a new terminal
Code Block |
---|
[]$ module use /project/biocompworkshop/software/modules/
[]$ module avail
-------------------------------------------------- /project/biocompworkshop/software/modules --------------------------------------------------
bam-readcount/0.8.0 fastp/0.23.4 r/4.4.0-biocomp regtools/1.0.0 rseqc_hawsh/1.0.0 subread/2.0.6 tophat/2.1.1
[]$ module load r/4.4.0-biocomp
[]$ module load rstudio/2023.9.0
[]$ rstudio |
...
Configure your R Environment for this Workshop
Code Block |
---|
# Within the R Terminal:
> library(Suerat)
Error in library(Suerat) : there is no package called 'Suerat'
> .libPaths(c('/project/biocompworkshop/software/r/libraries/4.4.0', '/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/rlib/R/library'))
# Notice how the list of System Library packages listed in RStudio has changed.
> library(Seurat)
Loading required package: SeuratObject
Loading required package: sp
Attaching package: 'SeuratObject'
The following objects are masked from 'package:base':
intersect, t |
Note |
---|
To use the pre-installed libraries within an R script you will need to add the |
...
Request Interactive Session (Compute Node) from a Login Node
|
---|
...
Request Interactive Session (Compute Node) with a GPU
|
---|
...
Request what you Need!
Code Block |
---|
# You're telling this command to use 4 threads - 4 cores
[1]$ hisat2-build -p 4 ...
[@blog1]$ salloc --account=biocompworkshop --time=30:00 --reservation=biocompworkshop
# Setup the Environment
[@t402]$ hisat2-build -p 4 --ss $INDEX/splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
...
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to read SNPs and splice sites: 00:00:00
Killed
[@blog1]$ salloc --account=biocompworkshop --time=30:00 --reservation=biocompworkshop -c 4
# Setup the Environment
[@t402]$ hisat2-build -p 4 --ss $INDEX/splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
...
Total time for call to driver() for forward index: 00:01:27 |
...
06 Create a basic workflow and submitting jobs.
Since RStudio is a GUI, demonstrate moving from running a script within RStudio to running using Rscript from the command-line.
Put the various elements of loading modules, moving into a folder, running an R file, that make up a basic workflow, into a script that can be submitted using
sbatch
to Slurm.Map the
salloc
arguments to#SBATCH
.Show how to monitor a jobs using
squeue
as well as using the email related Slurm options.Show how to request the DGX nodes and defining
gres
to specifically request a GPU.Provide a basic template.
Based on:
...
Why Submit a Job
A single computation can take, minutes, hours, days, weeks, months. An interactive session quickly becomes impractical.
Submit a job to the Slurm queue - Slurm manages everything for you.
Everything you do on the command-line, working out your workflow, is put into a script.
Workflow:
What resources you require? (Interactive desktop configuration,
salloc
options)What modules are loaded.
Which folder you’re running you computation within. Where the data is stored. Where you want the results.
Command-line calls being called.
Software applications being run.
...
Submit a Job to the Cluster
Convert salloc
command-line options to an sbatch
related script.
Options have defaults if not defined.
Code Block |
---|
# salloc
[@blog1 ~]$ salloc -A biocompworkshop -t 8:00:00 --mem=8G -c 2 -p dgx --gres=gpu:1 --reservation=biocompworkshop
# sbatch
# Options within your bash script.
#SBATCH --account=biocompworkshop # Account. MUST be defined.
#SBATCH --time=8:00:00 # Time. MUST be defined.
#SBATCH --mem=8G # Memory.
##SBATCH --mem-per-cpu=1G # Commented out. Default is 1G if no memory values defined.
#SBATCH --cpus-per-task=2 # CPUs per Task - default is 1 if not defined.
#SBATCH --partition=dgx # Partition - If not defined, Slurm will select.
#SBATCH --gres=gpu:1 # Generic Resources
#SBATCH --reservation=biocompworkshop # Reservation |
...
Additional sbatch
Options
Code Block |
---|
#SBATCH --job-name=<job-name>
#SBATCH --nodes=<#nodes> # Default is 1 if not defined.
#SBATCH --ntasks-per-node=<#tasks/node> # Default is 1 if not defined.
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=<filename>_%A.out # Postfix the job id to <filename>
# If not defined: slurm-<job-id>.out |
...
Example Script: What Goes into It?
The bash script can contain:
Linux/bash commands and script.
Module loads.
Application command-line calls.
Lets consider our R workflow. I have:
R scripts copied into my
/gscratch
folder.R related modules to load.
R scripts to run.
to track the time the job starts and ends.
...
Example Script: Running R Script
Code Block |
---|
#!/bin/bash
# Comment: The first line 'shebang' is followed by the interpreter or the command that should be used to execute the script.
#SBATCH --job-name=r_job
#SBATCH --account=biocompworkshop
#SBATCH --time=10:00
#SBATCH --reservation=biocompworkshop
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=r_%A.out
export R_FILES=/gscratch/$USER
echo "R Workflow Example"
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module use /project/biocompworkshop/software/modules
module load r/4.4.0-biocomp
cd $R_FILES
Rscript test_r_libraries.R
END=$(date +'%D %T')
echo "End:" $END |
...
Submit your Job
Code Block |
---|
# From your Working Directory - the folder you are currently in. [@blog2]$ ls run_r.sh test_data # You can submit the job from the login node. # Make a note of the job id. [@blog2]$ sbatch run_r.sh Submitted batch job 16054193 # ST Column: Status of P means Pending / R means Running. [@blog2]$ squeue -u salexan5 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16054193 teton r_job salexan5 R 0:06 1 t402 # Once the job is running, the defined output file will be |
...
generated.
[@blog2]$ ls
r_16054193.out run_r.sh test_data |
...
Monitor your Job
Code Block |
---|
# You can view the contents of your output file: [@blog2]$ cat r_16054193.out R Workflow Example Start: 06/05/24 14:02:01 SLURM_JOB_ID: 16054193 SLURM_JOB_NAME: r_job SLURM_JOB_NODELIST: m221 Sleeping... [@blog1]$ squeue -u salexan5 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16054193 teton r_job salexan5 R 0:18 1 t402 # If the job id is nolonger in the queue then it means the job is no longer running. # It might have completed, or failed and exited. [@blog1]$ squeue -u salexan5 JOBID |
...
PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
...
Monitor your Job: Continued…(1)
Code Block |
---|
# You can monitor the queue |
...
and/or log file to check if running. [salexan5@blog2 salexan5]$ cat r_16054193.out R Workflow Example Start: 06/05/24 14:02:01 SLURM_JOB_ID: 16054193 SLURM_JOB_NAME: r_job SLURM_JOB_NODELIST: t402 Sleeping... Loading required package: SeuratObject Loading required package: sp Attaching package: ‘SeuratObject’ The following objects are masked from ‘package:base’: intersect, t End: 06/05/24 14:02:29 # OR... |
...
Alternative Monitoring of Job via Email: Job Efficiency
Code Block |
---|
# Monitor your email: Email 1: Subject: beartooth Slurm Job_id=16054193 Name=r_job Began, Queued time 00:00:01 Email 2: Job Efficieny: Subject: beartooth Slurm Job_id=16054193 Name=r_job Ended, Run time 00:00:28, COMPLETED, ExitCode 0 Job ID: 16054193 Cluster: |
...
Try the vim tutor
...
Vim Tutor is a walkthrough for new users to get used to Vim.
Run vimtutor
in the command line to begin learning interactively.
...
beartooth
User/Group: salexan5/salexan5
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:00:07
CPU Efficiency: 25.00% of 00:00:28 core-walltime
Job Wall-clock time: 00:00:28
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 1000.00 MB (1000.00 MB/core) |
...
Example Script 2
This might look like something your cover in later sessions:
Code Block |
---|
#!/bin/bash
#SBATCH --job-name=hisat2
#SBATCH --account=biocompworkshop
#SBATCH --time=8:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
#SBATCH --reservation=biocompworkshop
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=hisat2_%A.out
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module load gcc/12.2.0 hisat2/2.2.1
export REFERENCE=/project/biocompworkshop/rshukla/Grch38/fasta
export INDEX=/project/biocompworkshop/rshukla/Grch38/Hisat2
# Comment: Location of the splicesites.tsv file.
cd /gscratch/$USER
hisat2-build -p 4 --ss splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
END=$(date +'%D %T')
echo "End:" $END |
...
Examples and Cheat Sheets
Can be copied from: /project/biocompworkshop/arcc_notes
...
07 Summary and Next Steps
Run over the goals we’ve looked at.
Point towards the previous workshops for additional details.