Getting Started with ARCC

Steps to get started in HPC with ARCC:

Steps to get started in HPC with ARCC:

1: Get an ARCC HPC account by being added to an HPC project

To access an ARCC HPC resource, you must to be added to a project on that resource whether you’re a UWyo faculty member (Principle Investigator; PI), researcher, or student.

You must be added as a member of a project on the cluster. (If you’ve received an e-mail from arcc-admin@uwyo.edu, indicating you’ve been added to a project, you have access to the HPC cluster).

  1. If you are a PI, you may request a project be created in which you will automatically be listed as PI and added as a member.

  2. If you are not a PI, a PI may request a project creation on your behalf, then request you be added as a member, or simply request you be added to a project they’ve already had created.

2: Log into HPC

ARCC HPC users should be aware that accessing our HPC means accessing a Linux environment. This will be different from a Windows/Mac PC.

  1. If you’ve received e-mails from arcc-admin (granting you access to a project) you’re ready to connect/login to the cluster!

  2. Go to the OnDemand resource for the cluster you’re trying to access. OnDemand makes HPCs available through your web browser.

    1. On MedicineBow this is MedicineBow OnDemand available at https://medicinebow.arcc.uwyo.edu (Expandable directions for access below)

    2. On Beartooth this is Southpass, available at https://southpass.arcc.uwyo.edu (Expandable directions for access below)

  1. Or SSH Login

    1. If you prefer to log into the cluster over SSH/Command Line, directions are dependent upon the client in which you’re connecting to your HPC from, and the HPC resources you’re accessing.

      1. On MedicineBow, you must configure your client to log in using an SSH key and certificate. Information for configuring keys is provided here.

      2. On Beartooth, See this page to log in with SSH.

    2. In your command line window, type in the following command: ssh <your_username>@<cluster_name>.arcc.uwyo.edu.

    3. When connected, a bunch of text will scroll by. This will vary depending on the cluster. On Beartooth, for example, there are usage rules, tips, and a summary of your storage utilization across all projects that you are part of.

    4. Upon login to the HPC, the command prompt will look something like this: [arccuser@blog1 ~]$. To learn more about the command prompt and command line, please look through our documentation on Command Line Interface.

3: Start Processing

While processing, you may also need to:

A key principle of any shared computing environment is that resources are shared among users and therefore must be scheduled. Please DO NOT simply log into the HPC and run your computations without requesting or scheduling resources from Slurm through a batch script or Interactive job.

ARCC uses the Slurm Workload Manager to regulate and schedule user submitted jobs on our HPC systems. In order for your job to submit properly to Slurm, you must at minimum specify your account and a time in your submission. There are 2 ways to run your work on an ARCC HPC systems from the Command Line:

Option 1: Run it as an Interactive Job

These are jobs that allow users access to computing nodes where applications can be run in real time. This may be necessary when performing heavy processing of files, or compiling large applications. Interactive jobs can be requested with an salloc command. ARCC has configured the clusters so that interactive jobs provide shell access on compute nodes themselves rather than running on the login node. An example of an salloc request can be expanded below.

The following is the simplest example of a command to start an interactive job. This command has the bare minimum information (account and time) in order to run any job on an ARCC cluster:

[cowboyjoe@hpclog1 ~]$ salloc --account=<your project name> --time=01:00

Breaking it down:

  1. [cowboyjoe@hpclog1 ~]$ This is our command prompt and specifies our username (cowboyjoe), the node on which we're currently working from (hpclog1), and the folder in which we're located on the HPC (~ which is short for our /home directory). To learn more about command line in Linux, see our Linux Command Line tutorial here.

  2. salloc is the slurm command to allocate a work session on the cluster

  3. --account is a flag specifying the account/project under which you’re performing your work in the session.

  4. --timeis a flag specifying your “walltime limit” which is how long you will have access to the HPC resources you’re requesting in your work session. At the end of this time, you will be disconnected from your requested resources.

    1. Syntax for time may be in the form of “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, days-hours:minutes”, or “days-hours:minutes:seconds”.

    2. On ARCC HPC, the maximum time you may request for any job is 7 days (job runtime may be extended, upon request).

  5. In this basic example, aside from time requested and account used, all allocated resources are set to the default.

    1. Total CPUs/cores available for this session is set to 1 by default.

    2. Total nodes we have access to in our session is set to 1 by default.

    3. Total memory allocated to the session is 1GB since by default we are allocated 1GB RAM per CPU in our request.

The next example is a set of commands. The first line is a command to allocate an interactive job requesting specific hardware to perform the computations in our session. The second line runs a python script:

[cowboyjoe@hpclog1 ~]$ salloc --account=arcc --time=40:00 --partition=moran --nodes=1 --cpus-per-task=8 --mem=8G python my_job_sequential_steps.py

Breaking it down:

  1. --account is a flag specifying the account/project under which you’re performing your work in the session. In this case, the project’s name is arcc.

  2. --timeis a flag specifying the “walltime limit” for this job, in this case, 40 minutes.

  3. --partition is a flag specifying which partition we want our nodes to come from for this session. You can view the partitions and learn about hardware specific to different partitions by viewing the hardware summary associated with the HPC you’re using. Since our overall computational needs are not significant, we requested the moran partition. These nodes have 16 cores/node and at least 32GB of RAM/node.

  4. --nodes is a flag telling Slurm how many compute nodes we need available to us to run the computations we want to run in the session. In this example, we only ask for 1.

  5. --cpus-per-task is a flag specifying how many cores/cpus we need available to run any single task. By default, ntasks (the number of tasks we run concurrently) is set to 1. Since our salloc command didn't specify --ntasks or any other "tasks-per" related parameters the total cpu for the requested session will be what was requested using the --cpus-per-task flag. In this example, 8 cores x (default) 1 tasks run concurrently = 8 cores total.

  6. --mem is a flag to request the minimum memory/RAM per node that we’ll need for our job. Above we requested 8G, so 8GB. This is under the 32GB of RAM/node for the partition (moran) we asked for in our salloc command, so Slurm accepts this memory request.

    1. The --mem flag should be followed by a unit prefix (G for GB).

    2. If a unit prefix is not specified and only an integer is provided, default prefix is M (megabytes).

  7. In response, Slurm assigns us a single node (node1), with access to a total of 8 cores and 8GB RAM for 40 minutes.

salloc: Granted job allocation 1012024 salloc: Nodes node1 are ready for job
  1. In the last line we have been granted our requested resources from the Slurm scheduler and use them to run a python script named ‘my_job_sequential_steps.py'using the default version of python installed on the cluster. If we need more hardware resources than what we asked for in our salloc command, we may get an error (such as a “oom-kill” or out of memory error indicating we didn’t request enough RAM to run our session), or our job will run for an extremely long time (which may mean we didn’t request enough CPU and/or we didn’t parse out our computational work appropriately in the script).

Option 2: Run it as a Batch Job

This means running of one or more tasks on a computer environment. Batch jobs are initiated using scripts or command-line parameters. They run to completion without further human intervention (fire and forget). Batch jobs are submitted to a job scheduler (on ARCC HPC, Slurm) and run on the first available compute node(s).

In the following example we need to create our own batch script which then gets run by Slurm to execute your jobs and any associated tasks. Below is an example of a batch script we created named myfirstjob.sh that then runs our computational work in a python script named my_job_sequential_steps.py:

Breaking it down:

  1. #!/bin/bash is the “shebang” line, telling which HPC to use the bash shell to interpret the script.

  2. --account is a flag specifying the account/project under which you’re performing your work in the session. Here the account we’re using is myproject

  3. --timeis a flag specifying your “walltime limit”. This is how long the script can run on HPC resources once it begins. At the end of this time, the script will end regardless of whether computations are completed. The example sets total time for our job to 1-02:15:45, 1 day, 2 hours, 15 minutes, 45 seconds.

    1. Syntax for time may be in the form of “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, days-hours:minutes”, or “days-hours:minutes:seconds”.

    2. On ARCC HPC, the maximum time you may request for any job is 7 days (job runtime may be extended, upon request).

  4. --partition is a flag specifying which nodes we want to use for our job. You can view the partitions and learn about hardware specific to different partitions by viewing the hardware summary associated with the HPC you’re using. Since our overall computational needs are not significant, we requested to run on the moran partition. These nodes have 16 cores/node and at least 32GB of RAM/node.

  5. --mem is a flag to request the minimum memory/RAM per node that we’ll need for our job. Above we specify 8G, so 8GB. This is under the 32GB of RAM/node for the partition (moran) we asked for in our salloc command, so Slurm accepts this memory request.

    1. The -mem flag should be followed by a unit prefix (G for GB).

    2. If a unit prefix is not specified and only an integer is provided, default prefix is M (megabytes).

  6. --job-name is a flag to specify the name of our job allocation. This will appear with the job id number if we query running jobs on the HPC.

  7. --mail-type is a flag to specify which job events should trigger notification e-mails. Setting it to ALL means notification e-mails will be sent when a job begins, ends, fails, hits time limits, never runs due to problems with the request, or gets requeued.

  8. --mail-user is a flag to specifies the e-mail address to notify when job events occur.

  9. In this basic example, aside from time requested and account used, all allocated resources are set to the default.

    1. Total CPUs/cores available for this session is set to 1 by default.

    2. Total nodes we have access to in our session is set to 1 by default.

    3. Total tasks we will run concurrently is set to 1 by default.

  10. In the last line we run a python script named ‘my_job_sequential_steps.py' using the default version of python installed on the cluster.

Assuming our batch script and the python script are complete and ready to run, we log into the HPC and run it on the cluster to submit our job by navigating to the location of our script, then running it with the following command:

Since this batch script first makes a request to Slurm to schedule our job and allocate resources before performing any computations, we can submit it on the login node.

To learn more about running parallel jobs, running jobs with GPUs, and avoid more common issues, see our SLURM tutorial.

3a. Get access to software

Option 1: Use The Module System

LMOD is very useful software on a HPC cluster that is leveraged to maintain a number of dynamic user environments and allow users to switch between software stacks and packages on HPC resources. You may check to see if software is available as a module by running a module spider in the following expandable example.

If you have a software package that is not installed as a module, but you think it would be widely utilize, make a request with us to see if it can be installed. Learn more about using LMOD here.

Option 2: Install it Yourself

If your software packages are somewhat research specific, you may install them to your project. ARCC will be providing an additional allocation of 250GB in every MedicineBow /project directory under /project/for software installations. Information on installing software on your own will vary depending on the software. General instructions may be found here.

3b. Transfer Data on/off HPC

Data transfer can be performed between HPC resources using a number of methods. The two easiest ways to transfer data are detailed below. A cumulative list of methods to transfer data on or off of ARCC Resources are detailed here.

Option 1: Southpass

Option 2: Globus (For big data transfers)

 

3c. View Visual Data or Access HPC with Graphics / Visual Interface

If you want to view visual output you’ve created on Beartooth or just need access to a GUI (Graphical User Interface), please use Southpass/OnDemand. Pages have been created for accessing Beartooth in a graphical user interface.