Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 65 Next »

Steps to get started in HPC with ARCC:

1: Get an ARCC HPC account by being added to an HPC project

To access an ARCC HPC resource, you must to be added to a project on that resource whether you’re a UWyo faculty member (Principle Investigator; PI), researcher, or student.

You must be added as a member of a project on the cluster. (If you’ve received an e-mail from arcc-admin@uwyo.edu, indicating you’ve been added to a project, you have access to the HPC cluster).

  1. If you are a PI, you may request a project be created in which you will automatically be listed as PI and added as a member.

  2. If you are not a PI, a PI may request a project creation on your behalf, then request you be added as a member.

2: Log into HPC

ARCC HPC users should be aware that accessing our HPC means accessing a Linux environment. This will be different from a Windows/Mac PC.

  1. If you’ve received e-mails from arcc-admin (granting you access to a project) you’re ready to connect/login to the cluster!

  2. Login through Southpass: ARCC’s OnDemand resource, which makes ARCC’s Beartooth HPC available through your web browser. You will be redirected to the wyologin page, and be prompted for your UWYO login credentials and 2-factor authentication.

 Southpass Login Directions

Error rendering macro 'excerpt-include' : No link could be created for 'SouthPass'.

  1. If you prefer to log into the cluster over SSH/Command Line, directions are dependent upon the client in which you’re connecting to HPC from. See this page to log in using SSH.

  2. In your command line window, type in the following command: ssh <your_username>@<cluster_name>.arcc.uwyo.edu.

    1. When connected, a bunch of text will scroll by. This will vary depending on the cluster. On Beartooth, for example, there are usage rules, tips, and a summary of your storage utilization across all projects that you are part of.

    2. Upon login to the HPC, the command prompt will look something like this: [arccuser@blog1 ~]$. To learn more about the command prompt and command line, please look through our documentation on Command Line Interface.

3: Start Processing

While processing, you may also need to:

A key principle of any shared computing environment is that resources are shared among users and therefore must be scheduled. Please DO NOT simply log into the HPC and run your computations without requesting or scheduling resources from Slurm through a batch script or Interactive job.

ARCC uses the Slurm Workload Manager to regulate and schedule user submitted jobs on our HPC systems. In order for your job to submit properly to Slurm, you must at minimum specify your account and a time in your submission. There are 2 ways to run your work on an ARCC HPC systems from the Command Line:

Option 1: Run it as an Interactive Job

These are jobs that allow users access to computing nodes where applications can be run in real time. This may be necessary when performing heavy processing of files, or compiling large applications. Interactive jobs can be requested with an salloc command. ARCC has configured the clusters so that interactive jobs provide shell access on compute nodes themselves rather than running on the login node. An example of an salloc request can be expanded below.

 Interactive Job Examples and Explanations ( Examples include using: salloc, --account, --time, --partition, --nodes, --cpus-per-task --mem)

The following is the simplest example of a command to start an interactive job. This command has the bare minimum information (account and time) in order to run any job on an ARCC cluster:

[cowboyjoe@hpclog1 ~]$ salloc --account=<your project name> --time=01:00

Breaking it down:

  1. [cowboyjoe@hpclog1 ~]$ This is our command prompt and specifies our username (cowboyjoe), the node on which we're currently working from (hpclog1), and the folder in which we're located on the HPC (~ which is short for our /home directory). To learn more about command line in Linux, see our Linux Command Line tutorial here.

  2. salloc is the slurm command to allocate a work session on the cluster

  3. --account is a flag specifying the account/project under which you’re performing your work in the session.

  4. --timeis a flag specifying your “walltime limit” which is how long you will have access to the HPC resources you’re requesting in your work session. At the end of this time, you will be disconnected from your requested resources.

    1. Syntax for time may be in the form of “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, days-hours:minutes”, or “days-hours:minutes:seconds”.

    2. On ARCC HPC, the maximum time you may request for any job is 7 days (job runtime may be extended, upon request).

  5. In this basic example, aside from time requested and account used, all allocated resources are set to the default.

    1. Total CPUs/cores available for this session is set to 1 by default.

    2. Total nodes we have access to in our session is set to 1 by default.

    3. Total memory allocated to the session is 1GB by default.

The next example is a set of commands. The first line is a command to allocate an interactive job requesting specific hardware to perform the computations in our session. The second line runs a python script:

 [cowboyjoe@hpclog1 ~]$ salloc --account=arcc --time=40:00 --partition=moran --nodes=1 --cpus-per-task=8 --mem=8G
 python my_job_sequential_steps.py

Breaking it down:

  1. --account is a flag specifying the account/project under which you’re performing your work in the session. In this case, the project’s name is arcc.

  2. --timeis a flag specifying the “walltime limit” for this job, in this case, 40 minutes.

  3. --partition is a flag specifying which partition we want our nodes to come from for this session. You can view the partitions and learn about hardware specific to different partitions by viewing the hardware summary associated with the HPC you’re using. Since our overall computational needs are not significant, we requested the moran partition. These nodes have 16 cores/node and at least 32GB of RAM/node.

  4. --nodes is a flag telling Slurm how many compute nodes we need available to us to run the computations we want to run in the session. In this example, we only ask for 1.

  5. --cpus-per-task is a flag specifying how many cores/cpus we need available to run any single task. By default, ntasks (the number of tasks we run concurrently) is set to 1. Since our salloc command didn't specify --ntasks or any other "tasks-per" related parameters the total cpu for the requested session will be what was requested using the --cpus-per-task flag. In this example, 8 cores x (default) 1 tasks run concurrently = 8 cores total.

  6. --mem is a flag to request the minimum memory/RAM per node that we’ll need for our job. Above we requested 8G, so 8GB. This is under the 32GB of RAM/node for the partition (moran) we asked for in our salloc command, so Slurm accepts this memory request.

    1. The -mem flag should be followed by a unit prefix (G for GB).

    2. If a unit prefix is not specified and only an integer is provided, default prefix is M (megabytes).

  7. In response, Slurm assigns us a single node (node1), with access to a total of 8 cores and 8GB RAM for 40 minutes.

salloc: Granted job allocation 1012024
salloc: Nodes node1 are ready for job
  1. In the last line we have been granted our requested resources from the Slurm scheduler and use them to run a python script named ‘my_job_sequential_steps.py'using the default version of python installed on the cluster. If we need more hardware resources than what we asked for in our salloc command, we may get an error (such as a “oom-kill” or out of memory error indicating we didn’t request enough RAM to run our session), or our job will run for an extremely long time (which may mean we didn’t request enough CPU and/or we didn’t parse out our computational work appropriately in the script).

Option 2: Run it as a Batch Job

This means running of one or more tasks on a computer environment. Batch jobs are initiated using scripts or command-line parameters. They run to completion without further human intervention (fire and forget). Batch jobs are submitted to a job scheduler (on ARCC HPC, Slurm) and run on the first available compute node(s).

 Batch Script Example and Explanation (Example using --account, --time, --partition, --mem, --job-name, --mail-type, --mail-user)

In the following example we need to create our own batch script which then gets run by Slurm to execute your jobs and any associated tasks. Below is an example of a batch script we created named myfirstjob.sh that then runs our computational work in a python script named my_job_sequential_steps.py:

#!/bin/bash
#SBATCH --account=myproject
#SBATCH --time=1-01:00:00
#SBATCH --partition=moran
#SBATCH --mem=8G
#SBATCH --job-name sequential_run
#SBATCH --mail-type=ALL
#SBATCH --mail-user=cowboyjoe@uwyo.edu

python my_job_sequential_steps.py

Breaking it down:

  1. #!/bin/bash is the “shebang” line, telling which HPC to use the bash shell to interpret the script.

  2. --account is a flag specifying the account/project under which you’re performing your work in the session. Here the account we’re using is myproject

  3. --timeis a flag specifying your “walltime limit”. This is how long the script can run on HPC resources once it begins. At the end of this time, the script will end regardless of whether computations are completed. The example sets total time for our job to 1-02:15:45, 1 day, 2 hours, 15 minutes, 45 seconds.

    1. Syntax for time may be in the form of “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, days-hours:minutes”, or “days-hours:minutes:seconds”.

    2. On ARCC HPC, the maximum time you may request for any job is 7 days (job runtime may be extended, upon request).

  4. --partition is a flag specifying which nodes we want to use for our job. You can view the partitions and learn about hardware specific to different partitions by viewing the hardware summary associated with the HPC you’re using. Since our overall computational needs are not significant, we requested to run on the moran partition. These nodes have 16 cores/node and at least 32GB of RAM/node.

  5. --mem is a flag to request the minimum memory/RAM per node that we’ll need for our job. Above we specify 8G, so 8GB. This is under the 32GB of RAM/node for the partition (moran) we asked for in our salloc command, so Slurm accepts this memory request.

    1. The -mem flag should be followed by a unit prefix (G for GB).

    2. If a unit prefix is not specified and only an integer is provided, default prefix is M (megabytes).

  6. --job-name is a flag to specify the name of our job allocation. This will appear with the job id number if we query running jobs on the HPC.

  7. --mail-type is a flag to specify which job events should trigger notification e-mails. Setting it to ALL means notification e-mails will be sent when a job begins, ends, fails, hits time limits, never runs due to problems with the request, or gets requeued.

  8. --mail-user is a flag to specifies the e-mail address to notify when job events occur.

  9. In this basic example, aside from time requested and account used, all allocated resources are set to the default.

    1. Total CPUs/cores available for this session is set to 1 by default.

    2. Total nodes we have access to in our session is set to 1 by default.

    3. Total tasks we will run concurrently is set to 1 by default.

  10. In the last line we run a python script named ‘my_job_sequential_steps.py' using the default version of python installed on the cluster.

Assuming our batch script and the python script are complete and ready to run, we log into the HPC and run it on the cluster to submit our job by navigating to the location of our script, then running it with the following command:

sbatch myfirstjob.sh

Since this batch script first makes a request to Slurm to schedule our job and allocate resources before performing any computations, we can submit it on the login node.

To learn more about running parallel jobs, running jobs with GPUs, and avoid more common issues, see our SLURM tutorial.

3a. Get access to software

Option 1: Use The Module System

LMOD is very useful software on a HPC cluster that is leveraged to maintain a number of dynamic user environments and allow users to switch between software stacks and packages on HPC resources. You may check to see if software is available as a module by running a module spider in the following expandable example.

 Using module spider to search for software modules:

Module spider

The spider subcommand is a great search tool to find out if the software package has been installed as a system package. From the command line, run the module spider command to output a list of the available software for the entire system:

$ module spider

To search for specific packages and/or versions, you can supply the names/arguments to the command:

 $ module spider samtools 

If there is only one version, the output will contain information regarding which compilers are required to be loaded before the package can be loaded as well as provide a brief segment on the help of the module. If there are multiple versions available, the output contains which versions are available and instructions to get more information on the individual version. To get information regarding a specific version of the package, include the version as part of the argument:

 $ module spider samtools/1.6 

There are opportunities to use regular expressions to search for modules. See output from module help for more information.

Example

The general process for all apps you might want to load is:

  1. Find versions.

  2. Find a version’s dependencies.

  3. Check what is already loaded and what is missing.

  4. Load required (missing) dependencies.

  5. Load application.

# Find versions of samtools
[@blog2 ~]$ module spider samtools
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  samtools:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        samtools/1.14
        samtools/1.16.1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "samtools" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:
     $ module spider samtools/1.16.1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Does samtools/1.16.1 have any dependencies that need to be loaded first?
[@blog2 ~]$ module spider samtools/1.16.1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  samtools: samtools/1.16.1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    You will need to load all module(s) on any one of the lines below before the "samtools/1.16.1" module is available to load.
      arcc/1.0  gcc/12.2.0
    Help:
      SAM Tools provide various utilities for manipulating alignments in the
      SAM format, including sorting, merging, indexing and generating
      alignments in a per-position format

# Check what is already loaded?
# In this arcc/1.0 is loaded by default whenever a new session is started.
# But gcc/12.2.0 is missing.
[@blog2 ~]$ ml
Currently Loaded Modules:
  1) slurm/latest (S)   2) arcc/1.0 (S)   3) singularity/3.10.3
  Where:
   S:  Module is Sticky, requires --force to unload or purge

# You can load gcc/12.2.0 on the same line as samtools/1.16.1, but it must be loaded before it, i.e. it appears to the left of it.
[@blog2 ~]$ module load gcc/12.2.0 samtools/1.16.1

# If you'd tried the following i.e. load samtools before gcc, then you'll see the following error:
[@blog2 ~]$ module load samtools/1.16.1 gcc/12.2.0
Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as requested: "samtools/1.16.1"
   Try: "module spider samtools/1.16.1" to see how to load the module(s).

If you have a software package that is not installed as a module, but you think it would be widely utilize, make a request with us to see if it can be installed. Learn more about using LMOD here.

Option 2: Install it Yourself

If your software packages are somewhat research specific, you may install them to your project. ARCC will be providing an additional allocation of 250GB in every MedicineBow /project directory under /project/for software installations. Information on installing software on your own will vary depending on the software. General instructions may be found here.

3b. Transfer Data on/off HPC

Data transfer can be performed between HPC resources using a number of methods. The two easiest ways to transfer data are detailed below. A cumulative list of methods to transfer data on or off of ARCC Resources are detailed here.

Option 1: Southpass

 Southpass Directions

Error rendering macro 'excerpt-include' : No link could be created for 'SouthPass'.

Option 2: Globus (For big data transfers)

 Globus Configuration Directions

Configuring Globus Online

  1. Login to Globus' Web app.

    1. Click “Login” on the top right corner of the webpage.

    2. To use your UWYO organizational login, search for ‘University of Wyoming’. It should autofill as shown in the screenshot.

      1. Hit 'Continue' to continue setting up your globus account, then click 'Allow' as shown below to allow globus to manage your groups search data using your ID and groups, and manage transfers.

    3. Note that the sign up and setup may be skipped if you’ve already logged into Globus and Globus still has you cached.

  2. On the left side of your browser window will be a file manager menu.

  3. Select this menu then type: uw-arcc in the Collection field to pull up ARCC storage spaces you have access to. ARCC manages several data storage endpoints so be sure you pick the one associated with your storage and HPC cluster (if applicable). You may have access to multiple ARCC endpoints/collections.

    1. MedicineBow and Data-Alcova are accessible and under the GCSv5.4 endpoint name Medicine Bow

      1. MedicineBow data (/gscratch, /home, and /project) are under /cluster/medbow

      2. Alcova storage (aka new Alcova) is under path /cluster/alcova

    2. Alcova (aka the Old Alcova) is designated with the GCSv5.4 endpoint named Alcova FileSystem Access.

    3. Beartooth is designated with GCSv5.4 endpoint named TetonCreek/Beartooth.

      1. Older Beartooth storage will be available on the TetoncCreek/Beartooth Collection and clicking on the link will take you to your /~/ home directory.

        1. Putting / as the path will display all shares you have access to. You should be able to access your project, home and gscratch directories from this collection.

    4. Pathfinder is designated with GCSv5.4 endpoint named Pathfinder S3 Access. You will need to set up S3 keys for access through globus.

  4. Note: Recently migrated Globus endpoints will be set as “Managed Mapped Collection (GCS)” and you should use these endpoints to access ARCC resources through Globus. If you have mapped private collections (those shared from your personal computer or elsewhere) those will be set as “Private Mapped Collections” (GCP). Older collections will be mapped as GCSv4 Shares.

  5. When you get to the folder you’re looking for you can save it for future access by adding it to your bookmarks. To do that, once you’re in the folder you’re looking for click on the bookmark icon to the right of the Path text box.

  6. When you wish to come back to your saved bookmarks, you can go to the bookmarks option on the very left side of the screen to get back to your bookmarked locations easily.

  7. If you wish to copy, sink or share, select the files/folders you wish to copy, transfer or share, and click ‘Start’, and choose a location or e-mail with which to transfer or sync.

  8. Click ‘Activity’ in the left pane to observe the transfer progress.

3c. View Visual Data or Access HPC with Graphics / Visual Interface

If you want to view visual output you’ve created on Beartooth or just need access to a GUI (Graphical User Interface), please use Southpass. Pages have been created for accessing Beartooth and Wildiris in a graphical user interface.

  • No labels