Genomic Data Science
Introduction: The workshop session will provide a quick tour covering high-level concepts, commands and processes for using Linux and HPC on our MedicineBow cluster. It will cover enough to allow an attendee to access the cluster and to perform analysis associated with this workshop.
Goals:
Introduce ARCC and what types of services we provide including “what is HPC?”
Define “what is a cluster”, and how is it made of partitions and compute nodes.
How to access and start using ARCC’s MedicineBow cluster - using our OnDemand service.
How to start an interactive desktop and open a terminal to use Linux commands within.
Introduce the basics of Linux, the command-line, and how its File System looks on MedicineBow.
Introduce Linux commands to allow navigation and file/folder manipulation.
Introduce Linux commands to allow text files to be searched and manipulated.
Introduce using a command-line text-editor and an alternative GUI based application.
How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.
How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.
How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.
We will not be covering:
We will not covering, but workshops are available, on:
Using a terminal to SSH onto the Cluster - see Intro to Accessing the Cluster.
Data Management nor Data Transfer (such as using Globus).
Using / Creating Conda Environments - one method for installing your own software.
Using the Jupyter Service via OnDemand.
Sections
- 1 Sections
- 2 *** Class 01 ***
- 3 00 Introduction and Setting the Scope:
- 4 01 About UW ARCC and HPC
- 5 02 Using OnDemand to access the MedicineBow HPC Cluster
- 6 03 Using Linux and the Command Line
- 7 04 Text Editors
- 8 *** Class 02 ***
- 9 05 Using Linux to Search/Parse Text Files
- 10 06 Lets start using R(/Python) and RStudio
- 11 07 Create a basic workflow and submitting jobs.
- 12 08 Summary and Next Steps
*** Class 01 ***
00 Introduction and Setting the Scope:
01 About UW ARCC and HPC
About ARCC and how to reach us
What is HPC
|
---|
What is a Compute Node?
Homogeneous vs Heterogeneous HPCs
Cluster: Heterogeneous: Partitions
02 Using OnDemand to access the MedicineBow HPC Cluster
Log in and Access the Cluster
Structure of the Linux File System and HPC Directories
Linux Operating Systems (Generally)
Compare and Contrast: Linux, HPC Specific, MedicineBow Specific
03 Using Linux and the Command Line
Exercise: Shell Terminal Introducing Command Line
What am I Using?
Login Node Policy
Demonstrating how to get help in CLI
| [<username>@mblog1 ~]$ man pwd
NAME
pwd - print name of current/working directory
SYNOPSIS
pwd [OPTION]...
DESCRIPTION
Print the full filename of the current working directory.
-L, --logical
use PWD from environment, even if it contains symlinks
-P, --physical
avoid all symlinks
--help display this help and exit
--version
output version information and exit
If no option is specified, -P is assumed.
NOTE: your shell may have its own version of pwd, which usually supersedes the version described here. Please refer to your shell's documentation
for details about the options it supports. |
| [<username>@mblog1 ~]$ cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
or: cp [OPTION]... SOURCE... DIRECTORY
or: cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY. |
Demonstrating file navigation in CLI
File Navigation demonstrating the use of:
| [<username>@mblog1 ~]$ pwd
/home/<username>
[<username>@mblog1 ~]$ ls
Desktop Documents Downloads ondemand R
[<username>@mblog1 ~]$ cd /project/genomicdatasci
[<username>@mblog1 genomicdatasci]$ pwd
/project/genomicdatasci
[<username>@mblog1 genomicdatasci]$ cd <username>
[<username>@mblog1 <username>]$ ls -la
total 2.0K
drwxr-sr-x 2 <username> genomicdatasci 4.0K May 23 11:05 .
drwxrws--- 80 root genomicdatasci 4.0K Jun 4 14:39 ..
[<username>@mblog1 <username>]$ pwd
/project/genomicdatasci/<username>
[<username>@mblog1 <username>]$ cd ..
[<username>@mblog1 genomicdatasci]$ pwd
/project/genomicdatasci |
Demonstrating how to create and remove files and folders using CLI
Creating, moving and copying files and folders:
|
04 Text Editors
*** Class 02 ***
05 Using Linux to Search/Parse Text Files
Your Environment: Echo and Export
Use Our Environment Variable
Search for a File
Use Wildcards *
View the Contents of a File
View the Start and End of a File
Search the Contents of a Text File
Grep-ing with Case-Insensitive and Line Numbers
Pipe: Count, Sort
Uniq
Redirect Output into a File
06 Lets start using R(/Python) and RStudio
Open a Terminal
Setting Up a Session Environment
What is Available?
Is Python and/or R available?
Load a Compiler
Load a Newer Version of Python
Typically Loading R
Using module purge to reset you session/environment
Modules Specific for this Class
Using R/4.4.0 + Library
Version: r/4.4.0-genomic
(deprecated)
Version: r/4.4.0-genomic-gcc14
R/4.3.3 and R Package Pigengene
Using RStudio with R/Library of Packages for this Class
Using RStudio and R/Pigengene for this Class
Other Class Modules
Request Interactive Session (Compute Node) from a Login Node
|
---|
Request Interactive Session (Compute Node) with a GPU
|
---|
Request what you Need!
07 Create a basic workflow and submitting jobs.
Why Submit a Job
Submit a Job to the Cluster
Additional sbatch
Options
Example Script: What Goes into It?
Example Script: Running R Script
Submit your Job
Monitor your Job
Why is my Job Not Running?
Monitor your Job: Continued…
Alternative Monitoring of Job via Email: Job Efficiency
Example Script 2
This might look like something your cover in later sessions:
Being a Good Cluster Citizen
08 Summary and Next Steps