ARCC uses the Slurm Workload Manager to regulate user submitted tasks on our HPC systems. Unless otherwise noted, if you’re running a job, Slurm is managing the resources. There are two primary ways of doing work on an ARCC HPC system:
Batch Processing / Batch Jobs: this involves the execution of one or more tasks on a computer environment. Batch jobs are initiated using scripts or command-line parameters and run to completion without further human intervention (fire and forget). Batch jobs are submitted to a job scheduler (such as Slurm) and run on the first available compute node(s).
Serial : These use only one core for the process. Tasks/jobs are run one after the other in series.
Parallel : These can utilize multiple cores on a single machine, or even multiple cores on multiple machines. Tasks/jobs can be run simultaneously across multiple cores and/or nodes.
Interactive Session: At ARCC, interactive sessions run on the login node. This is a simple way to become familiar with the computing environment and test your code before attempting a long production run as a batch job. The login nodes have finite resources. Over-utilization can impact other users, thus users found to be running intensive or batch jobs on login nodes will receive a warning and have their jobs canceled.
Let’s look at a bash command “Hello World” vs an interactive “Hello World” vs a batch job “Hello World”:
If, at the command prompt, you type in: echo “Hello world” and press enter
[arccuser@tlog1 ~]$ echo "Hello world"
you’ll see results similar to these:
Hello world [arccuser@tlog1 ~]$
These are jobs that allow shell access to computing nodes where applications can be run interactively, heavy processing of files, or compiling large applications. They can be requested with similar arguments to batch jobs. ARCC has configured the clusters such that Slurm interactive allocations will give shell access on the compute nodes themselves rather than keeping the shell on the login node. Use the salloc (session allocation) command to launch interactive jobs. Executing this from the command line:
$ salloc --account=<insert the name of your project> --time=01:00
should return something like this:
salloc: Pending job allocation <job#> salloc: job <job#> queued and waiting for resources salloc: job <job#> has been allocated resources salloc: Granted job allocation <job#> [arccuser@m045 ~]$
Note that you’re now “on” a compute node instead of a login node. Now run:
[arccuser@m045 ~]$ echo "Hello world"
which returns results similar to these:
Hello world [arccuser@m045 ~]$
The exercise above is a very basic example of an Interactive job. The value of interactive jobs is to allow users to work interactively with the CLI or interactive use of debuggers (ddt, gdb) , profilers (map, gprof), or language interpreters such as Python, R, or Julia. Please see https://arccwiki.atlassian.net/wiki/spaces/DOCUMENTAT/pages/1502150689 to learn more about running GUI interactive jobs on ARCC resources, and check out the Slurm Workload Manager pages to learn more about Slurm and its options.
Now imagine you have to run the “Hello World” command or something much more complex a thousand times. Nobody wants to type that simple text string a thousand times. Let’s write a batch script to do that for us (if you’re unfamiliar with shell scripting or working with the command line, see https://arccwiki.atlassian.net/wiki/spaces/DOCUMENTAT/pages/1596194853 for more information):
#!/bin/bash #SBATCH --account=<insert the name of your project> #SBATCH --time=00:01:00 echo "Hello World!"
Save and exit the editor. Now run:
[arccuser@m045 scripts]$ sbatch hello.sh
and note the output:
Submitted batch job <job#>
Where’s the Hello World? Type ‘ls’ to get a listing of the folder contents. In this example, there is only the Hello.sh script and an output file with your job number imbedded in it:
Let’s take a look at the contents of the output file: