Genomic Data Science

Introduction: The workshop session will provide a quick tour covering high-level concepts, commands and processes for using Linux and HPC on our MedicineBow cluster. It will cover enough to allow an attendee to access the cluster and to perform analysis associated with this workshop.

Goals:

  • Introduce ARCC and what types of services we provide including “what is HPC?”

  • Define “what is a cluster”, and how is it made of partitions and compute nodes.

  • How to access and start using ARCC’s MedicineBow cluster - using our OnDemand service.

  • How to start an interactive desktop and open a terminal to use Linux commands within.

  • Introduce the basics of Linux, the command-line, and how its File System looks on MedicineBow.

  • Introduce Linux commands to allow navigation and file/folder manipulation.

  • Introduce Linux commands to allow text files to be searched and manipulated.

  • Introduce using a command-line text-editor and an alternative GUI based application.

  • How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.

  • How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.

  • How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.


We will not be covering:

We will not covering, but workshops are available, on:


Sections



*** Class 01 ***



00 Introduction and Setting the Scope:


01 About UW ARCC and HPC


About ARCC and how to reach us

ARCC Wiki

What is HPC

  • Users log in from their clients (desktops, laptops, workstations) into a login node.

  • In an HPC Cluster, each compute node can be thought of as it’s own desktop, but the hardware resources of the cluster are available collectively as a single system.

  • Users may request specific allocations of resources available on the cluster - beyond that of a single node.

  • Allocated resources may include CPUs (Cores), Nodes, RAM/Memory, GPUs, etc.

  • Users log in from their clients (desktops, laptops, workstations) into a login node.

  • In an HPC Cluster, each compute node can be thought of as it’s own desktop, but the hardware resources of the cluster are available collectively as a single system.

  • Users may request specific allocations of resources available on the cluster - beyond that of a single node.

  • Allocated resources may include CPUs (Cores), Nodes, RAM/Memory, GPUs, etc.


What is a Compute Node?


Homogeneous vs Heterogeneous HPCs


Cluster: Heterogeneous: Partitions


02 Using OnDemand to access the MedicineBow HPC Cluster


Log in and Access the Cluster


Structure of the Linux File System and HPC Directories


Linux Operating Systems (Generally)


Compare and Contrast: Linux, HPC Specific, MedicineBow Specific


03 Using Linux and the Command Line


Exercise: Shell Terminal Introducing Command Line


What am I Using?


Login Node Policy


Demonstrating how to get help in CLI

  • man - Short for the manual page. This is an interface to view the reference manual for the application or command.

  • man pages are only available on the login nodes.

 

[<username>@mblog1 ~]$ man pwd NAME pwd - print name of current/working directory SYNOPSIS pwd [OPTION]... DESCRIPTION Print the full filename of the current working directory. -L, --logical use PWD from environment, even if it contains symlinks -P, --physical avoid all symlinks --help display this help and exit --version output version information and exit If no option is specified, -P is assumed. NOTE: your shell may have its own version of pwd, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports.
  • --help - a built-in command in shell. It accepts a text string as the command line argument and searches the supplied string in the shell's documents.

[<username>@mblog1 ~]$ cp --help Usage: cp [OPTION]... [-T] SOURCE DEST or: cp [OPTION]... SOURCE... DIRECTORY or: cp [OPTION]... -t DIRECTORY SOURCE... Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

Demonstrating file navigation in CLI

File Navigation demonstrating the use of:

  • pwd (Print Working Directory)

  • ls (“List” lists information about directories and any type of files in the working directory)

  • ls flags

    • -l (tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)

    • -a (Lists all entries in the directory, including the entries that begin with a . which are hidden)

  • cd (Change Directory)

  • cd .. (Change Directory - up one level)

[<username>@mblog1 ~]$ pwd /home/<username> [<username>@mblog1 ~]$ ls Desktop Documents Downloads ondemand R [<username>@mblog1 ~]$ cd /project/genomicdatasci [<username>@mblog1 genomicdatasci]$ pwd /project/genomicdatasci [<username>@mblog1 genomicdatasci]$ cd <username> [<username>@mblog1 <username>]$ ls -la total 2.0K drwxr-sr-x 2 <username> genomicdatasci 4.0K May 23 11:05 . drwxrws--- 80 root genomicdatasci 4.0K Jun 4 14:39 .. [<username>@mblog1 <username>]$ pwd /project/genomicdatasci/<username> [<username>@mblog1 <username>]$ cd .. [<username>@mblog1 genomicdatasci]$ pwd /project/genomicdatasci

Demonstrating how to create and remove files and folders using CLI

Creating, moving and copying files and folders:

  • touch (Used to create a file without content. The file created using the touch command is empty)

  • mkdir (Make Directory - to create an empty directory)

  • mv (Move - moves a file or directory from one location to another)

  • cd.. (Change Directory - up one level)

  • cp (Copy - copies a file or directory from one location to another)

    • -r flag (Recursive)

  • ~ (Alias for /home/user)

  • rm (Remove - removes a file or if used with -r, removes directory and recursively removes files in directory)


04 Text Editors



*** Class 02 ***



05 Using Linux to Search/Parse Text Files


Your Environment: Echo and Export


Use Our Environment Variable


Search for a File


Use Wildcards *


View the Contents of a File


View the Start and End of a File


Search the Contents of a Text File


Grep-ing with Case-Insensitive and Line Numbers


Pipe: Count, Sort


Uniq


Redirect Output into a File


06 Lets start using R(/Python) and RStudio


Open a Terminal


Setting Up a Session Environment


What is Available?


Is Python and/or R available?


Load a Compiler


Load a Newer Version of Python


Typically Loading R


Using module purge to reset you session/environment


Modules Specific for this Class


Using R/4.4.0 + Library

Version: r/4.4.0-genomic (deprecated)

Version: r/4.4.0-genomic-gcc14


R/4.3.3 and R Package Pigengene


Using RStudio with R/Library of Packages for this Class


Using RStudio and R/Pigengene for this Class


Other Class Modules


Request Interactive Session (Compute Node) from a Login Node

 

 


Request Interactive Session (Compute Node) with a GPU

 

 


Request what you Need!


07 Create a basic workflow and submitting jobs.


Why Submit a Job


Submit a Job to the Cluster


Additional sbatch Options


Example Script: What Goes into It?


Example Script: Running R Script


Submit your Job


Monitor your Job


Why is my Job Not Running?


Monitor your Job: Continued…


Alternative Monitoring of Job via Email: Job Efficiency


Example Script 2

This might look like something your cover in later sessions:


Being a Good Cluster Citizen


08 Summary and Next Steps