Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction: The workshop session will provide a quick tour covering high-level concepts, commands and processes for using Linux and HPC on our Beartooth cluster. It will cover enough to allow an attendee to access Introduction: The workshop session will provide a quick tour covering high-level concepts, commands and processes for using Linux and HPC on our Beartooth cluster. It will cover enough to allow an attendee to access the cluster and to perform analysis associated with this workshop.

...

  • Users may log in with their BYODs (do you have a computer with you to follow along with the workshop?)

    • Log into UWYO wifi if you can. (Non-UW users will be unable to)

    Follow along with our slides available at: final link
    • .

  • Logging in:

    • If you have a UWYO username and password: UW Users may test their HPC access by opening a browser and then going to the following URL: https://southpass.arcc.uwyo.edu.

    • Standard wyologin page will be presented. Log in with your

    • UWYO username and password.

    • If you do not have a UWYO username and password: Come see me for a Yubikey and directions allow you to access the Beartooth HPC cluster if you do not have a UW account.

Expand
titleDirections for Logging into Southpass

Insert excerpt
DOCUMENTATWeb Access to Beartooth: SouthPassDOCUMENTAT
Web Access to Beartooth: SouthPass
nameConnecting to Southpass
nopaneltrue

...

More extensive and in-depth information and walkthroughs are available on our wiki and under workshops/tutorials. You are welcome to dive into those in your own time. Content within them should provide you with a lot of the foundational concepts you would need to be familiar with to become a proficient HPC user.

...

Based on: Wiki Front Page: About ARCC

Insert excerpt
DOCUMENTAT:ARCC WikiDOCUMENTAT:
ARCC Wiki
nameAboutARCC
  • In short, we maintain internally housed scientific resources including more than one HPC Cluster, data storage, and several research computing servers and resources.

  • We are here to assist UW researchers like yourself with your research computing needs.

...

What is HPC

HPC stands for High Performance Computing and is one of UW ARCC’s core services. HPC is the practice of aggregating computing power in a way that delivers a much higher performance than one could get out of a typical desktop or workstation. HPC is commonly used to solve large problems, and has some common use cases:

...

  • We typically have multiple users independently running jobs concurrently across compute nodes.

  • Resources are shared, but to do not interfere with any one else’s resources.

    • i.e. you have your own cores, your own block of memory.

  • If someone else’s job fails it does NOT affect yours.

  • Example: The two GPU compute nodes part of this reservation each have 8 GPU devices. We can have different, individual jobs run on each of these compute nodes, without effecting each other.

...

Homogeneous vs Heterogeneous HPCs

There are 2 types of HPC systems:

  1. Homogeneous: All compute nodes in the system share the same architecture. CPU, memory, and storage are the same across the system. (Ex: NWSC’s Derecho)

  2. Heterogeneous: The compute nodes in the system can vary architecturally with respect to CPU, memory, even storage, and whether they have GPUs or not. Usually, the nodes are grouped in partitions. Beartooth is a heterogeneous cluster and our partitions are described on the Beartooth Hardware Summary Table on our ARCC Wiki.

...

Expand
titleBeartooth Partition Table

Insert excerpt
DOCUMENTAT:Beartooth Hardware Summary TableDOCUMENTAT:
Beartooth Hardware Summary Table
nopaneltrue

Expand
titleBeartooth GPU Table

Insert excerpt
DOCUMENTAT:Beartooth Hardware Summary TableDOCUMENTAT:
Beartooth Hardware Summary Table
nameGPU Table
nopaneltrue

...

A reservation can be considered a temporary partition.

It is a set of compute nodes reserved for a period of time for a set of users/projects, who get priority use.

...

Important Dates:

  1. After the 17th of June this reservation will stop and you will drop down to general usage if you have another Beartooth project.

  2. The project itself will be removed after the 24th of June. You will not be able to use/access it. Anything you require please copy out of the project.

...

Southpass is our Open OnDemand resource allowing users to access Beartooth over a web-based portal. Learn more about Southpass here.

Goals:

  • Demonstrate how users log into Southpass

  • Demonstrate requesting and using a XFCE Desktop Session

  • Introduce the Linux File System and how it compares to common workstation environments

    • Introduce HPC specific directories and how they’re used

    • Introduce Beartooth specific directories and how they’re used

  • Demonstrate how to access files using the Beartooth File Browsing Application

  • Demonstrate the use of emacs, available as a GUI based text-editor

Based on: SouthPass

...

Log in and Access the Cluster

...

Expand
titleWalk through (displays a filled out web form) going through steps for requesting a Beartooth XFCE Desktop
  1. Click on Beartooth XFCE Desktop
    You will be presented with a form asking for specific information.

    1. Project/Account: specifies the project you have access to on the HPC Cluster

    2. Reservation: not usually used for our general cluster use, but set up to access specific hardware that has been reserved for this workshop.

    3. Number of Hours: How long you plan to use the Remote Desktop Connection to the Beartooth HPC.

    4. Desktop Configuration: How many CPUs and Memory you require to perform your computations within this remote desktop session.

    5. GPU Type: GPU Hardware you want to access, specific to your use case. This may be set to “None - No GPU" if your computations do not require a GPU. Note: you can select DGX GPUs (Listed as V100s from the GPU Type drop down)

  2. You should see an interactive session starting. When it’s ready, it will turn green.

    1. Note the Host: field. Your Interactive session has been allocated to a specific host on the cluster. This is the node you are working on when you’re using your remote desktop session.

    2. Click Launch Beartooth XFCE Desktop to open your Remote Desktop session

  3. You should now see a Linux Desktop in your browser window

    A screenshot of a computer

Description automatically generated

    1. Beartooth runs Red Hat Enterprise Linux. If you’ve worked on a Red Hat System, it will probably look familiar.

    2. If not, hopefully it looks similar enough to a Windows or Mac Graphical OS Interface.

      1. Apps dock at the bottom (Similar to Mac OS, or Pinned apps in taskbar on Windows OS)

      2. Desktop icons provide links to specific folder locations and files, like Mac and PC).

Note: While we use a webform to request Beartooth resources on Southpass, later training will show how resource configurations can be requested through command line via salloc or sbatch commands.

...

  1. /apps (Specific to ARCC HPC) is like on Windows or on a Mac.

    1. Where applications are installed and where modules are loaded from. (More on that later).

  2. /alcova (Specific to ARCC HPC).

    1. Additional research storage for research projects that may not require HPC but is accessible from beartooth.

    2. You won’t have access to it unless you were added to an alcova project by the PI.

...

Exercise: File Browsing in Southpass GUI

...

Expand
titleHow to access your files using the southpass files application

Insert excerpt
DOCUMENTATWeb Access to Beartooth: SouthPassDOCUMENTAT
Web Access to Beartooth: SouthPass
nameAccessing your Beartooth Data in Southpass
nopaneltrue

...

  • The Beartooth Shell Access opens up a new browser tab that is running on a login node. Do not run any computation on these.
    [<username>@blog2 ~]$

  • The SouthPass Interactive Desktop (terminal) is already running on a compute node.
    [<username>@t402 ~]$

...

Login Node Policy

...

  1. Anything compute-intensive (tasks using significant computational/hardware resources - Ex: using 100% cluster CPU)

  2. Long running tasks (over 10 min)

  3. Any collection of a large # of tasks resulting in a similar hardware footprint to actions mentioned previously.  

  4. Not sure?  Usesallocto be on the safe side. This will be covered later.
    Ex:salloc –-account=arccanetrain -–time 40:00

  5. See more on ARCC’s Login Node Policy here

...

  • man - Short for the manual page. This is an interface to view the reference manual for the application or command.

  • man pages are only available on the login nodes.

Code Block
[arcc-t10@blog2 ~]$ man pwd
NAME
       pwd - print name of current/working directory
SYNOPSIS
       pwd [OPTION]...
DESCRIPTION
       Print the full filename of the current working directory.
       -L, --logical
              use PWD from environment, even if it contains symlinks
       -P, --physical
              avoid all symlinks
       --help display this help and exit
       --version
              output version information and exit
       If no option is specified, -P is assumed.
       NOTE:  your  shell  may have its own version of pwd, which usually supersedes the version described here.  Please refer to your shell's documentation
       for details about the options it supports.
  • --help - a built-in command in shell. It accepts a text string as the command line argument and searches the supplied string in the shell's documents.

Code Block
[arcc-t10@blog1 ~]$ cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
  or:  cp [OPTION]... SOURCE... DIRECTORY
  or:  cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

...

File Navigation demonstrating the use of:

  • pwd (Print Working Directory)

  • ls (“List” lists information about directories and any type of files in the working directory)

  • ls flags

    • -l (tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)

    • -a (Lists all entries in the directory, including the entries that begin with a . which are hidden)

  • cd (Change Directory)

  • cd .. (Change Directory - up one level)

Code Block
[arcc-t10@blog2 ~]$ pwd
/home/arcc-t10
[arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R
[arcc-t10@blog2 ~]$ cd /project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ cd arcc-t10
[arcc-t10@blog2 arcc-t10]$ ls -la
total 2.0K
drwxr-sr-x  2 arcc-t10 biocompworkshop 4.0K May 23 11:05 .
drwxrws--- 80 root     biocompworkshop 4.0K Jun  4 14:39 ..
[arcc-t10@blog2 arcc-t10]$ pwd
/project/biocompworkshop/arcc-t10
[arcc-t10@blog2 arcc-t10]$ cd ..
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop

...

Creating, moving and copying files and folders:

  • touch (Used to create a file without content. The file created using the touch command is empty)

  • mkdir (Make Directory - to create an empty directory)

  • mv (Move - moves a file or directory from one location to another)

  • cd.. (Change Directory - up one level)

  • cp (Copy - copies a file or directory from one location to another)

    • -r flag (Recursive)

  • ~ (Alias for /home/user)

  • rm (Remove - removes a file or if used with -r, removes directory and recursively removes files in directory)

Code Block
[arcc-t10@blog2 arcc-t10]$ touch testfile
[arcc-t10@blog2 arcc-t10]$ mkdir testdirectory
[arcc-t10@blog2 arcc-t10]$ ls
testdirectory  testfile
[arcc-t10@blog2 arcc-t10]$ mv testfile testdirectory
[arcc-t10@blog2 arcc-t10]$ cd testdirectory
[arcc-t10@blog2 testdirectory]$ ls
testfile
[arcc-t10@blog2 testdirectory]$ cd .. 
[arcc-t10@blog2 arcc-t10]$ cp -r testdirectory ~
[arcc-t10@blog2 arcc-t10]$ cd ~
[arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R  testdirectory 
[arcc-t10@blog2 ~]$ cd testdirectory
[arcc-t10@blog2 ~]$ ls
testfile
[arcc-t10@blog2 ~]$ rm testfile
[arcc-t10@blog2 ~]$ ls

...

Text Editor Cheatsheets

Note: On Beartooth, vi maps to vim i.e. if you open vi, you're actually starting vim.

...

Demonstrating vi/vim text editor

...

Vim Tutor is a walkthrough for new users to get used to Vim.

Run vimtutor in the command line to begin learning interactively.

Code Block
[arc-t10@blog2 ~]$ vimtutor
===============================================================================
=    W e l c o m e   t o   t h e   V I M   T u t o r    -    Version 1.7      =
=============================================================================== 
     Vim is a very powerful editor that has many commands, too many to 
     explain in a tutor such as this. This tutor is designed to describe 
     enough of the commands that you will be able to easily use Vim as 
     an all-purpose editor. 
     ...

...

*** Break ***

...

04 Using Linux to Search/Parse Text Files

...

Since the cluster has to cater for everyone we can not provide a simple desktop environment that provides everything.

Instead we provide modules that a user will load that configures their environment for their particular needs within a session.

...

Code Block
# Within the R Terminal:
> library(SueratSeurat)
Error in library(Suerat) : there is no package called 'Suerat'
> .libPaths(c('/project/biocompworkshop/software/r/libraries/4.4.0', '/apps/u/spack/gcc/12.2.0/r/4.4.0-7i7afpk/rlib/R/library'))
# Notice how the list of System Library packages listed in RStudio has changed.
> library(Seurat)
Loading required package: SeuratObject
Loading required package: sp
Attaching package: 'SeuratObject'
The following objects are masked from 'package:base':
    intersect, t

...

  • Linux/bash commands and script.

  • Module loads.

  • Application command-line calls.

 

Lets  Lets consider our R workflow. I have:

...

Code Block
# You can view the contents of your output file:
[@blog2]$ cat r_16054193.out
R Workflow Example
Start: 06/05/24 14:02:01
SLURM_JOB_ID: 16054193
SLURM_JOB_NAME: r_job
SLURM_JOB_NODELIST: m221
Sleeping...
[@blog1]$ squeue -u salexan5
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          16054193     teton    r_job salexan5  R       0:18      1 t402
# If the job id is nolonger in the queue then it means the job is no longer running.
# It might have completed, or failed and exited.
[@blog1]$ squeue -u salexan5 salexan5
             JOBID PARTITION     NAME     USER ST   JOBID PARTITION   TIME  NAME     USER ST       TIME  NODES NODELIST(REASON)NODES NODELIST(REASON)

...

Why is my Job Not Running?

Previously we explained: The two GPU compute nodes part of this reservation each have 8 GPU devices. We can have different, individual jobs run on each of these compute nodes, without effecting each other.

So, we can have 16 concurrent jobs all running with a single GPU each.

But, what if a 17th person submitted a similar job?

Slurm will add this job to the queue, but it will be PENDING (P) while it waits for the necessary resources to become available.

As soon as there are, this 17th job will start, and it’s status will update to RUNNING (R).

Slurm manages this for you.

...

Monitor your Job: Continued…

...

Code Block
# You can monitor the queue and/or log file to check if running.
[salexan5@blog2 salexan5]$ cat r_16054193.out
R Workflow Example
Start: 06/05/24 14:02:01
SLURM_JOB_ID: 16054193
SLURM_JOB_NAME: r_job
SLURM_JOB_NODELIST: t402
Sleeping...
Loading required package: SeuratObject
Loading required package: sp
Attaching package: ‘SeuratObject’
The following objects are masked from ‘package:base’:
    intersect, t
End: 06/05/24 14:02:29
# OR...

...

Code Block
#!/bin/bash
#SBATCH --job-name=hisat2
#SBATCH --account=biocompworkshop       
#SBATCH --time=8:00:00                  
#SBATCH --cpus-per-task=4 
#SBATCH --mem=8G                        
#SBATCH --reservation=biocompworkshop   
   #SBATCH --mail-type=ALL
#SBATCH --cpusmail-per-task=4 user=<email-addr>
#SBATCH --mem=8G                        
#SBATCH --reservation=biocompworkshop   
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email-addr>
#SBATCH --output=hisat2_%A.out
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module load gcc/12.2.0 hisat2/2.2.1
export REFERENCE=/project/biocompworkshop/rshukla/Grch38/fasta
export INDEX=/project/biocompworkshop/rshukla/Grch38/Hisat2
# Comment: Location of the splicesites.tsv file.
cd /gscratch/$USER
hisat2-build -p 4 --ss splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
END=$(date +'%D %T')
echo "End:" $END

Examples and Cheat Sheets

Can be copied from: /project/biocompworkshop/arcc_notes

 07 Summary and Next Steps

Run over the goals we’ve looked at.

Point towards the previous workshops for additional details.

...

output=hisat2_%A.out
START=$(date +'%D %T')
echo "Start:" $START
echo "SLURM_JOB_ID:" $SLURM_JOB_ID
echo "SLURM_JOB_NAME:" $SLURM_JOB_NAME
echo "SLURM_JOB_NODELIST:" $SLURM_JOB_NODELIST
module load gcc/12.2.0 hisat2/2.2.1
export REFERENCE=/project/biocompworkshop/rshukla/Grch38/fasta
export INDEX=/project/biocompworkshop/rshukla/Grch38/Hisat2
# Comment: Location of the splicesites.tsv file.
cd /gscratch/$USER
hisat2-build -p 4 --ss splicesites.tsv --exon $INDEX/exons.tsv $REFERENCE/chr22_with_ERCC92.fa $INDEX/chr22
END=$(date +'%D %T')
echo "End:" $END

...

Examples and Cheat Sheets

Can be copied from: /project/biocompworkshop/arcc_notes

...

07 Summary and Next Steps

  • We’ve covered the following high-level concepts, commands and processes:

  • What is HPC and what is a cluster - focusing on ARC’s Beartooth cluster.

  • An introduction to Linux and its File System, and how to navigate around using an Interactive Desktop and/or using the command-line.

  • Linux command-line commands to view, search, parse, sort text files.

  • How to pipe the output of one command to the input of another, and how to redirect output to a file.

  • Using vim as a command-line text editor and/or emacs as a GUI within an Interactive Desktop.

  • Setting up your environment (using modules) to provide R/Python environments, and other software applications.

  • Accessing compute nodes via a SouthPass Interactive Desktop, and requesting different resources (cores, memory, GPUs).

  • Requesting interactive sessions (from a login node) using salloc.

  • Setting up a workflow, within a script, that can then be submitted to the Slurm queue using sbatch, and how to monitor jobs.

Further Assistance:

  • Everything covered can be found in previous workshops and additional information can be found on our Wiki.

  • ARCC personnel will be around in-person for the first three days to assist with cluster/Linux related questions and issues.

  • We will provide virtual support over Thursday/Friday. Submit questions via the Slack channel and these will be passed onto us, and we will endeavor to set up a zoom via our Office Hours.