Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Next »

/We will be providing a quick tour covering high-level ideas for using Linux and HPC on our cluster, which should all you to access and use our Beartooth Cluster to perform analysis associated with this workshop.

Goals:

  • Introduce ARCC and what types of services we provide including “what is HPC?”

  • Define “what is a cluster”, and how is it made of partitions and compute nodes.

  • How to access and start using ARCC’s Beartooth cluster - using our SouthPass service.

  • How to start an interactive desktop and open a terminal to use Linux commands within.

  • Introduce the basics of Linux, the command-line, and how its File System looks on Beartooth.

  • Introduce Linux commands to allow navigation and file/folder manipulation.

  • Introduce Linux commands to allow text files to be searched and manipulated.

  • Introduce using a command-line text-editor and an alternative GUI based application.

  • How to setup a Linux environment to use R(/Python) and start RStudio, by loading modules.

  • How to start interactive sessions to run on a compute node, to allow computation, requesting appropriate resources.

  • How to put elements together to construct a workflow that can be submitted as a job to the cluster, which can then be monitored.



0 Getting Started

  • Users may log in with their BYODs (do you have a computer with you to follow along with the workshop?)

    • Log into UWYO wifi if you can. (Non-UW users will be unable to)

  • Follow along with our slides available at: final link

  • Logging in:

    • If you have a UWYO username and password: UW Users may test their HPC access by opening a browser and then going to the following URL: https://southpass.arcc.uwyo.edu.

    • Standard wyologin page will be presented. Log in with your

    • UWYO username and password.

    • If you do not have a UWYO username and password: Come see me for a Yubikey and directions allow you to access the Beartooth HPC cluster if you do not have a UW account.

 Directions for Logging into Southpass

Error rendering macro 'excerpt-include' : No link could be created for 'SouthPass'.


00 Introduction and Setting the Scope:

The roadmap to becoming a proficient HPC user can be long, complicated, and varies depending on the user. There are a large number of concepts to cover. Some of these concepts are included in today’s training but given time constraints, it’s impossible to get to all of them. This workshop session introduces key high-level concepts, and follows a very hands-on demonstration approach, for you to follow.


Our training will help provide the foundation necessary for you to use Beartooth cluster specifically to perform some of the exercises later in this workshop.

Because of our limited time this morning, please submit any questions to the slack channel for this workshop and workshop instructors can address them as they are available.

More extensive and in-depth information and walkthroughs are available on our wiki and you are welcome to dive into those in your own time. Content within them should provide you with a lot of the foundational concepts you would need to be familiar with to become a proficient HPC user.


01 About UW ARCC and HPC

Goals:

  • Describe ARCC’s role at UW

  • Provide resources for ARCC Researchers to seek help

  • Introduce staff members, including those available throughout the workshop

  • Introduce the concept of an HPC cluster, it’s architecture and when to use one

  • Introduce the Beartooth HPC architecture, hardware, and partitions


About ARCC and how to reach us

Based on: Wiki Front Page: About ARCC

ARCC Wiki

How to Reach Us:

About ARCC:

E-Mail: arcc-help@uwyo.edu
Request our Services using our Service Portal
New Office Hours Schedule Effective Fall 2024:
Tues 11am-1pm, Wed 1:30-3:30pm, still hosted over Zoom
Visit our UWYO Webpage: https://uwyo.edu/arcc

UW ARCC is the primary research computing facility for the University of Wyoming and our department is housed within the Division of Research and Economic Development.  Our expert staff are committed to providing the UW research community with access to specialized research computing infrastructure, knowledge, support, and a large range of scientific software pre-configured for your use. If you are new to ARCC, please begin with our “getting started” wiki pages.   

We manage and maintain and support all internally housed scientific computing resources including HPC and high performance data storage.  UW ARCC also aims to support all high performance computing resources available to any UW researcher.  This includes the support of External HPC Resources which includes but is not limited to Wyoming-NCAR Alliance Allocations through NWSC.  

A list of our internally offered services is detailed in our service list.

  • In short, we maintain internally housed scientific resources including more than one HPC Cluster, data storage, and several research computing servers and resources.

  • We are here to assist UW researchers like yourself with your research computing needs.

3 ARCC Staff Members will be available through the course of the workshop if you need help using Beartooth:

ARCC End User Support

Simon Alexander

HPC & Research Software Manager

simon.png

Dylan Perkins

Research Computing Facilitator

dylan.png

Lisa Stafford

Research Computing Facilitator

lisa.png

What is HPC

HPC stands for High Performance Computing and is one of UW ARCC’s core services. HPC is the practice of aggregating computing power in a way that delivers a much higher performance than one could get out of a typical desktop or workstation. HPC is commonly used to solve large problems, and has some common use cases:

  1. Performing computation-intensive analyses on large datasets: MB/GB/TB in a single or many files, computations requiring RAM in excess of what is available on a single workstation, or analysis performed across multiple CPUs (cores) or GPUs.

  2. Performing long, large-scale simulations: Hours, days, weeks, spread across multiple nodes each using multiple cores.

  3. Running repetitive tasks in parallel: 10s/100s/1000s of small short tasks.

  • Users log in from their clients (desktops, laptops, workstations) into a login node.

  • In an HPC Cluster, each compute node can be thought of as it’s own desktop, but the hardware resources of the cluster are available collectively as a single system.

  • Users may request specific allocations of resources available on the cluster - beyond that of a single node.

  • Allocated resources may include CPUs (Cores), Nodes, RAM/Memory, GPUs, etc.


Homogeneous vs Heterogeneous HPCs

There are 2 types of HPC systems:

  1. Homogeneous: All compute nodes in the system share the same architecture. CPU, memory, and storage are the same across the system. (Ex: NWSC’s Derecho)

  2. Heterogeneous: The compute nodes in the system can vary architecturally with respect to CPU, memory, even storage, and whether they have GPUs or not. Usually, the nodes are grouped in partitions. Beartooth is a heterogeneous cluster and our partitions are described on the Beartooth Hardware Summary Table on our ARCC Wiki.


Beartooth Hardware and Partitions

 Beartooth Partition Table

Slurm Partition name

Requestable features

Node
count

Sockets/
Node

Cores/
Socket

Threads/
Core

Total
Cores/
Node

RAM
(GB)

Processor (x86_64)

Local Disks

OS

Use Case

Key Attributes

moran

fdr, intel, sandy, ivy, community

273

2

8

1

16

64 or 128

Intel Ivybridge/
Sandybridge

1 TB HD

RHEL 8.8

For compute jobs not needing the latest and greatest hardware.

Original Moran compute

moran-bigmem

fdr, intel, haswell

2

2

8

1

16

512

Intel Haswell

1 TB HD

RHEL 8.8

For jobs not needing the latest hardware, w/ above average memory requirements.

Moran compute w/ 512G of RAM

moran-hugemem

fdr, intel, haswell, community

2

2

8

1

16

1024

Intel Haswell

1 TB HD

RHEL 8.8

For jobs that don’t need the latest hardware, w/ escalated memory requirements.

Moran compute w/ 1TB of RAM

dgx

edr, intel, broadwell

2

2

20

2

40

512

Intel Broadwell

7 TB SSD

RHEL 8.8

For GPU and AI-enabled workloads.

Special DGX GPU compute nodes

teton

edr, intel, broadwell, community

175

2

16

1

32

128

Intel Broadwell

240 GB SSD

RHEL 8.8

For regular compute jobs.

Teton compute

teton-cascade

edr, intel, cascade, community

56

2

20

1

40

192 or 768

Intel Cascade Lake

240 GB SSD

RHEL 8.8

For compute jobs w/ on newer-older hardware, and somewhat higher memory requirements.

Teton compute w/ Cascade Lake CPUs

teton-gpu

edr, intel, broadwell, community

6

2

16

1

32

512

Intel Broadwell

240 GB SSD

RHEL 8.8

For compute jobs utilizing GPUs on prior cluster hardware.

Teton GPU compute

teton-hugemem

edr, intel, broadwell

8

2

16

1

32

1024

Intel Broadwell

240 GB SSD

RHEL 8.8

For compute jobs w/ large memory requirements, running on fast prior cluster hardware.

Teton compute w/ 1TB of RAM

teton-massmem

edr, amd, epyc

2

2

24

1

48

4096

AMD/EPYC

4096 GB SSD

RHEL 8.6

For compute jobs w/ exceedingly demanding memory requirements

Teton compute w/ 4TB of RAM

teton-knl

edr, intel, knl

12

1

72

4

72

384

Intel Knights Landing

240 GB SSD

RHEL 8.8

For jobs using many cores on a single node, but speed isn’t critical

Teton compute w/ Intel Knight’s Landing CPU’s

beartooth

edr, intel, icelake

2

2

28

1

56

256

Intel Icelake

436 GB SSD

RHEL 8.8

For general compute jobs running the latest and greatest Beartooth hardware

Beartooth compute

beartooth-gpu

edr, intel, icelake

4

2

28

1

56

250 or 1024

Intel Icelake

436 GB SSD

RHEL 8.8

For compute jobs needing GPU on the latest and greatest hardware.

Beartooth GPU compute

beartooth-bigmem

edr, intel, icelake

6

2

28

1

56

515

Intel Icelake

436 GB SSD

RHEL 8.8

For jobs w/ above average memory requirements, on the latest and greatest hardware.

Beartooth compute w/ 512G of RAM

beartooth-hugemem

edr, intel, icelake

8

2

28

1

56

1024

Intel Icelake

436 GB SSD

RHEL 8.8

For jobs w/ large memory requirements on the latest and greatest hardware.

Beartooth compute w/ 1TB of RAM

 Beartooth GPU Table

The ARCC Beartooth cluster has a number of compute nodes that contain GPUs. The following tables list each node that has GPUs and the type of GPU installed.

GPU Type

Partition

Example slurm value to request

# of Nodes

GPU devices per node

CUDA Cores

Tensor Cores

GPU Memory Size (GB)

Compute Capability

Tesla P100

teton-gpu

(all available on non-investor)

#SBATCH --partition=teton-gpu
#SBATCH --gres=gpu:?

8

2

3584

0

16

6.0

V100

dgx

(both available on non-investor)

#SBATCH --partition=dgx
#SBATCH --gres=gpu:?

2

8

5120

640

16/32

7.0

A30

beartooth-gpu (4)

non-investor (3)

#SBATCH --partition=beartooth-gpu
#SBATCH --gres=gpu:?

7

2

3584

224

25

8.0

T4

non-investor

2

3

2560

320

16G

7.5

See Beartooth Hardware Summary Table on the ARCC Wiki.


02 Using Southpass to access the Beartooth HPC Cluster

Southpass is our Open OnDemand resource allowing users to access Beartooth over a web-based portal. Learn more about Southpass here.

Goals:

  • Demonstrate how users log into Southpass

  • Demonstrate requesting and using a XFCE Desktop Session

  • Introduce the Linux File System and how it compares to common workstation environments

    • Introduce HPC specific directories and how they’re used

    • Introduce Beartooth specific directories and how they’re used

  • Demonstrate how to access files using the Beartooth File Browsing Application

  • Demonstrate the use of emacs, available as a GUI based text-editor

Based on: SouthPass


Log in and Access the Cluster

Login to Southpass

 How to log in to Southpass

If you haven’t yet:

  1. Open a browser of your choice.

  2. Go to https://southpass.arcc.uwyo.edu

  3. Log in with your UWYO username and password, or the username and password to the training account you’ve been provided.

  4. Once in you will be presented with the Southpass Dashboard:

    A screenshot of a computer

Description automatically generated

Using Southpass

Interactive Applications in Southpass are requested by filling out a webform to specify hardware requirements while you use the application.

Other applications can be accessed without filling out a webform:

  1. Job Composer (To create batch scripts)

  2. Active Jobs (To view your active jobs)

  3. Home Directory (File Explorer/Upload/Download)

  4. Beartooth System Status (View cluster status)

Exercise: Beartooth XFCE Desktop

Requests are made through a webform in which you specifically request certain hardware or software to use on Beartooth.

 Walk through (displays a filled out web form) going through steps for requesting a Beartooth XFCE Desktop
  1. Click on Beartooth XFCE Desktop
    You will be presented with a form asking for specific information.

    1. Project/Account: specifies the project you have access to on the HPC Cluster

    2. Reservation: not usually used for our general cluster use, but set up to access specific hardware that has been reserved for this workshop.

    3. Number of Hours: How long you plan to use the Remote Desktop Connection to the Beartooth HPC.

    4. Desktop Configuration: How many CPUs and Memory you require to perform your computations within this remote desktop session.

    5. GPU Type: GPU Hardware you want to access, specific to your use case. This may be set to “None - No GPU" if your computations do not require a GPU. Note: you can select DGX GPUs (Listed as V100s from the GPU Type drop down)

  2. You should see an interactive session starting. When it’s ready, it will turn green.

    1. Note the Host: field. Your Interactive session has been allocated to a specific host on the cluster. This is the node you are working on when you’re using your remote desktop session.

    2. Click Launch Beartooth XFCE Desktop to open your Remote Desktop session

  3. You should now see a Linux Desktop in your browser window

    A screenshot of a computer

Description automatically generated

    1. Beartooth runs Red Hat Enterprise Linux. If you’ve worked on a Red Hat System, it will probably look familiar.

    2. If not, hopefully it looks similar enough to a Windows or Mac Graphical OS Interface.

      1. Apps dock at the bottom (Similar to Mac OS, or Pinned apps in taskbar on Windows OS)

      2. Desktop icons provide links to specific folder locations and files, like Mac and PC).

While we use a webform to request Beartooth resources on Southpass, later training will show how resource configurations can be requested through command line via salloc or sbatch commands.


Structure of the Linux File System and HPC Directories

Linux File Structure

 Walk through getting to the main file system from within the Beartooth XFCE Desktop Application

We are now remote logged into a Linux Desktop.

  1. To take a look at the top level of the file structure, click on “Filesystem”.

This is specific to the Beartooth HPC but most Linux environments will look very similar

A screenshot of a computer

Description automatically generated

Linux Operating Systems (Generally)


Compare and Contrast: Linux, HPC Specific, Beartooth Specific

Based on: Beartooth Filesystem

HPC Specific Folders:

  1. /home (Common across most shared HPC Resources)

    1. What is it for? Similar to on a PC, and Macintosh HD → Users on a Mac

    2. Permissions: It should have files specific to you, personally, as the HPC user. By default no one else has access to your files in your home.

    3. Director Path: Every HPC user on Beartooth has a folder in on Beartooth under /home/<your_username> or $HOME

    4. Default Quota: 25GB

  2. /project (Common across most shared HPC Resources)

    1. What is it for? Think of it as a shared folder for you and all your project members. Similar to /glade/campaign on NCAR HPC.

    2. Permissions: All project members have access to the folder. By default, all project members can read any files or folders within, and can write in the main project directory.

    3. Directory path: get to it at /project/biocompworkshop/

    4. Subfolders in /project/biocompworkshop/ for each user are added to project when a user gets added to the project, but only that user can write to their folder.

    5. Default Quota: 1TB which is for the project folder itself and includes all it’s contents and subfolders.

  3. /gscratch (Scratch folder, common across most HPC resources but sometimes just called "scratch")

    1. What is it for? It’s “scratch space”, so it’s storage dedicated for you to store temporary data you need access to.

    2. Permissions: Like /home, contents is specific to you, personally, as the HPC user. By default no one else has access to your files in your /gscratch.

    3. Director Path: Every HPC user on Beartooth has a gscratch directory in Beartooth under /gscratch/<your_username> or $SCRATCH

    4. Default Quota: 5TB

      1. Don’t store anything in /gscratch that you need or don't have backed up elsewhere. It's not meant to store anything long term.

      2. Everyone’s /gscratch directory is subject to ARCC's purge policy.

Beartooth Specific

  1. /apps (Specific to ARCC HPC) is like on Windows or on a Mac.

    1. Where applications are installed and where modules are loaded from. (More on that later).

  2. /alcova (Specific to ARCC HPC).

    1. Additional research storage for research projects that may not require HPC but is accessible from beartooth.

    2. You won’t have access to it unless you were added to an alcova project by the PI.


Exercise: File Browsing in Southpass GUI

Users can access their files using the south pass file browser app.

 How to access your files using the southpass files application

Error rendering macro 'excerpt-include' : No link could be created for 'SouthPass'.


Demonstration opening emacs GUI based text editor

 Walk through opening emacs GUI Text Editor

Once you’re in a XFCE Desktop Session:

  1. Open the Applications Menu in the top right corner of your desktop

  2. Choose Run Program

  3. An Application Finder window will pop up. In the text box, type in: emacs

  4. Click Launch

  5. This will open a new window with the emacs text editor

  6. Users can click on the File menu and select Visit New File to create a new file, or Open file to continue working on one they’ve already started.


03 Using Linux and the Command Line

Goals:

  • Introduce the shell terminal and command line interface

    • Demonstrate starting a Beartooth SSH shell using Southpass

    • Demonstrate information provided in a command prompt

  • Introduce Policy for HPC Login Nodes

  • Demonstrate how to navigate the file system to create and remove files and folders using command line interface (CLI)

    • mkdir, cd, ls, mv, cp

  • Demonstrate the use of man, --help and identify when these should be used

  • Demonstrate using a command-line text editor, vi

Based on: The Command Line Interface


Exercise: Shell Terminal Introducing Command Line

 How to access Beartooth in a Shell Terminal from Southpass
  1. Click the following Icon on the Beartooth Dashboard

  2. This opens up a Beartooth SSH session in a web-based terminal:

  3. Login will display:

    1. Cluster you’ve logged into

    2. How to get help

    3. Important message(s) of the day

    4. A printout of arccquota

  4. Anatomy of Command Line Prompt: 

    1. Who (am I?):

    2. What (system am I talking to/working on?):

    3. Where (am I on the system?): 


Login Node Policy

As a courtesy to your colleagues, please do not run the following on any login nodes:  

  1. Anything compute-intensive (tasks using significant computational/hardware resources - Ex: using 100% cluster CPU)

  2. Long running tasks (over 10 min)

  3. Any collection of a large # of tasks resulting in a similar hardware footprint to actions mentioned previously.  

  4. Not sure?  Use salloc to be on the safe side. 
    Ex: salloc –-account=arccanetrain -–time 40:00

  5. See more on ARCC’s Login Node Policy here


Demonstrating how to get help in CLI

  • man - Short for the manual page. This is an interface to view the reference manual for the application or command.

arcc-t10@blog2 ~]$ man pwd
NAME
       pwd - print name of current/working directory
SYNOPSIS
       pwd [OPTION]...
DESCRIPTION
       Print the full filename of the current working directory.
       -L, --logical
              use PWD from environment, even if it contains symlinks
       -P, --physical
              avoid all symlinks
       --help display this help and exit
       --version
              output version information and exit
       If no option is specified, -P is assumed.
       NOTE:  your  shell  may have its own version of pwd, which usually supersedes the version described here.  Please refer to your shell's documentation
       for details about the options it supports.
  • --help - a built-in command in shell. It accepts a text string as the command line argument and searches the supplied string in the shell's documents.

[arcc-t10@blog1 ~]$ cp --help
Usage: cp [OPTION]... [-T] SOURCE DEST
  or:  cp [OPTION]... SOURCE... DIRECTORY
  or:  cp [OPTION]... -t DIRECTORY SOURCE...
Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.

Demonstrating file navigation in CLI

File Navigation demonstrating the use of:

  • pwd (Print Working Directory)

  • ls (“List” lists information about directories and any type of files in the working directory)

  • ls flags

    • -l (tells the mode, # of links, owner, group, size (in bytes), and time of last modification for each file)

    • -a (Lists all entries in the directory, including the entries that begin with a . which are hidden)

  • cd (Change Directory)

  • cd .. (Change Directory - up one level)

arcc-t10@blog2 ~]$ pwd
/home/arcc-t10
arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R
arcc-t10@blog2 ~]$ cd /project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop
[arcc-t10@blog2 biocompworkshop]$ cd arcc-t10
[arcc-t10@blog2 arcc-t10]$ ls -la
total 2.0K
drwxr-sr-x  2 arcc-t10 biocompworkshop 4.0K May 23 11:05 .
drwxrws--- 80 root     biocompworkshop 4.0K Jun  4 14:39 ..
[arcc-t10@blog2 arcc-t10]$ pwd
/project/biocompworkshop/arcc-t10
[arcc-t10@blog2 arcc-t10]$ cd ..
[arcc-t10@blog2 biocompworkshop]$ pwd
/project/biocompworkshop

Demonstrating how to create and remove files and folders using CLI

Creating, moving and copying files and folders:

  • touch (Used to create a file without content. The file created using the touch command is empty)

  • mkdir (Make Directory - to create an empty directory)

  • mv (Move - moves a file or directory from one location to another)

  • cd.. (Change Directory - up one level)

  • cp (Copy - copies a file or directory from one location to another)

    • -r flag (Recursive)

  • ~ (Alias for /home/user)

  • rm (Remove - removes a file or if used with -r, removes directory and recursively removes files in directory)

[arcc-t10@blog2 arcc-t10]$ touch testfile
[arcc-t10@blog2 arcc-t10]$ mkdir testdirectory
[arcc-t10@blog2 arcc-t10]$ ls
testdirectory  testfile
[arcc-t10@blog2 arcc-t10]$ mv testfile testdirectory
[arcc-t10@blog2 arcc-t10]$ cd testdirectory
[arcc-t10@blog2 testdirectory]$ ls
testfile
[arcc-t10@blog2 testdirectory]$ cd.. 
[arcc-t10@blog2 arcc-t10]$ cp -r testdirectory ~
[arcc-t10@blog2 arcc-t10]$ cd ~
[arcc-t10@blog2 ~]$ ls
Desktop  Documents  Downloads  ondemand  R  testdirectory 
[arcc-t10@blog2 ~]$ cd testdirectory
[arcc-t10@blog2 ~]$ ls
testfile
[arcc-t10@blog2 ~]$ rm testfile
[arcc-t10@blog2 ~]$ ls

Text Editor Cheatsheets


Demonstrating vi/vim text editor

VI/Vim is one of several text editors available for Linux Command Line. (vi filename or vim filename)

  • i - to start insert mode (allows you to enter text)

  • <esc> key - to exit out of insert mode

  • dd - when not in insert mode, to delete a whole line

  • :q - outside of insert mode to quit

  • :wq - outside of insert mode to write the contents to the file, and then quit

cat - reads file(s) sequentially, displaying content to the terminal

[arcc-t10@blog2 arcc-t10]$ vi testfile

stuff and things
~                                                                                                                                
~                                                                                                                                
~                                                                                                                                
~                                                                                                                                
:wq            

[arcc-t10@blog2 arcc-t10]$ cat testfile
stuff and things

Try the vim tutor

Vim Tutor is a walkthrough for new users to get used to Vim.

Run vimtutor in the command line to begin learning interactively.

[arc-t10@blog2 ~]$ vimtutor
===============================================================================
=    W e l c o m e   t o   t h e   V I M   T u t o r    -    Version 1.7      =
=============================================================================== 
     Vim is a very powerful editor that has many commands, too many to 
     explain in a tutor such as this. This tutor is designed to describe 
     enough of the commands that you will be able to easily use Vim as 
     an all-purpose editor. 
     ...

  • No labels