HPC System and Job Queries

1 Overview: HPC Information and Compute Job Information
2 Common SLURM Commands
3 ARCCJOBS: Get a report of jobs currently running on the cluster
4 ARCCQUOTA: Get a report of your common HPC data storage locations and usage

Overview: HPC Information and Compute Job Information

System querying is helpful to understand what is happening with the system. Meaning, what compute jobs are running, storage quotas, job history, etc. This page contains commands and examples of how to find that information.

Common SLURM Commands

The following describes common SLURM commands and common flags you may want to include when running them. SLURM commands are often run with flags (appended to the command with --flag) to stipulate specific information that should be included in output.

SQUEUE: Get information about running and queued jobs on the cluster with `squeue`

This command is used to pull up information about the jobs that currently exist in the SLURM queue. This command run as default will print all running and queued jobs on the cluster listing each job’s job ID, partition, username, job status, number of nodes, and a node list, with the name of the nodes allocated to each job:

squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1000001  inv-arcc myjob_11     user5  R 2-15:39:34      1 mba30-005
           1000002  inv-lab2  AIML-CE   joeblow  R 6-13:02:32      1 mba30-004
           1000005  inv-lab2  AIML-CE   joeblow  R 6-17:31:53      1 mba30-004
           1000012        mb interact cowboyjoe  R 2-21:28:49      1 mbcpu-010
           1000015        mb sys/dash    jsmith  R    1:05:19      1 mbcpu-001
           1000019    mb-a30 sys/dash  janesmit  R    8:45:36      1 mba30-006
           1000022    mb-a30 Script.s   doctorm PD       0:00      1 (Resources)
           1000025    mb-a30 Script.22  doctorz  R    7:05:44      1 mba30-001
           1000028   mb-h100 sys/dash    mmajor PD       0:00      1 (Resources)
           1000033   mb-h100 sys/dash    mmajor PD       0:00      1 (Priority)
           1000037   mb-h100 sys/dash  kjohnson PD       0:00      1 (Priority)
           1000041   mb-h100 sys/dash  kjohnson PD       0:00      1 (Priority)
           1000045   mb-h100 sys/dash    mmajor  R 2-02:25:37      1 mbh100-003
           1000058   mb-l40s Script.se  doctorz  R 1-00:58:25      1 mbl40s-003
           1000062     teton C1225-TT    user17  R 3-19:54:48      1 t507
           1000065     teton C1225-TT    user17  R 4-17:36:26      1 t502

Helpful flags when calling `squeue` to tailor your query

Flag	Used this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output

Flag	Used this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output
me	To get a printout with just your jobs	n/a	n/a	`--me`	The `--me` flag, will print the squeue info, specifically about jobs submitted by you: `[jsmith@mblog1 ~]$ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1000002 inv-lab2 AIML-CE jsmith R 6-13:02:32 1 mba30-004 1000005 inv-lab2 AIML-CE jsmith R 6-17:31:53 1 mba30-004`
user	To get a printout of a specific user’s jobs	`-u`	`squeue -u joeblow`	`--user`	The `--user` or `-u` flag, (shown in the expandable example below specifying a username), prints squeue info, specifically about jobs submitted by a specified user: `[jsmith@mblog1 ~]$ squeue --user=joeblow JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1000002 inv-lab2 AIML-CE joeblow R 6-13:02:32 1 mba30-004 1000005 inv-lab2 AIML-CE joeblow R 6-17:31:53 1 mba30-004`
long	To get a printout of jobs including wall time	-l	`squeue -l`	`--long`	The `--long` flag (shown in the expandable example below) will print the above information as well as the wall time requested for the job.
format	To get squeue printout with specified format & output	`-o`	`squeue -o Account,UserName,JobID,SubmitTime,StartTime,TimeLeft`	`--format`	If appended with the `--format` flag, `squeue` info is given using specified format & output. Format should be indicated using column names recognized by SLURM (hint: run `squeue --helpFormat` to get a list of SLURM’s recognized column names)

** you can also run squeue --help to get a comprehensive list of flags available to run with the squeue command

SACCT: Get information about recent or completed jobs on the cluster with `sacct`

The default sacct command: This print a list of your recent or recently completed jobs

Helpful flags when calling `sacct` to tailor your query

Flag	Use this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output

Flag	Use this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output
job	To get info about specific job#(s)	`-j`	`sacct -j 1000013`	`--jobs`
batch script	To view batch / submission script for a specific job	`-B`	`sacct -j 1000101 -B`	`--batch-script`	You must specify a job with the `--jobs` or `-j` flag to use the `-B` or `--batch-script` flag and see it’s associated batch / submission script. This will not work on interactive jobs run from an `salloc` command, or jobs that were not called from a script.
user	To get a printout of a specific user’s jobs	`-u`	`sacct -u joeblow`	`--user`	The `--user` or `-u` flag, (shown in the expandable example below specifying a username), prints squeue info, specifically about jobs submitted by a specified user:
start	To get a printout of job(s) starting after a date/time	`-S`	`sacct -S 2024-11-01`	`--start`	Dates and times should be specified with format `YYYY-MM-DD-HH:MM`
end	To get a printout of job(s) ending before a given date/time	`-E`	`sacct -E 2024-11-24:12:00:00`	`--end`	Dates and times should be specified with format `YYYY-MM-DD-HH:MM`
format	To get sacct printout with specified format & output	`-O`	`sacct -O Account,JobID`	`--format`	If appended with the `--format` flag, `sacct` info is given using specified format & output. Format should be indicated using column names recognized by SLURM (hint: run `sacct --helpformat` to get a list of SLURM’s recognized column names)
submit line	To view the submit command for a specified job	`-o SubmitLine`	`sacct -o SubmitLine -j 1000101`	`--format=SubmitLine`	This is a way of using the `--format` flag from above to see a print out of the command your entered to submit the specified job after the `-j` flag.
WorkDir	To view the working directory used by the job to execute commands	`-o WorkDir`	`sacct -o WorkDir -j 1000101`	`--format=WorkDir`

My Job Failed. What Do these Exit Codes Mean?

Slurm records error codes in the form of numerical values that seem rather cryptic. While we don’t always know for sure why they’re caused without investigation, some causes are more likely than others. Exit codes usually consist of 2 sets of numbers (one before a colon and one after) or a single number. Common error codes and their likely causes are below:

Exit Code	Likely Cause

Exit Code	Likely Cause
0	The job ran successfully
Any non-zero value	The job failed in some form or another
1	A general failure
2	Something was wrong with a shell command in the script
3 and above	Job error associated with software commands (check software specific exit codes)
0:9	The job was cancelled (usually the user or Slurm/System)
0:15	The job was cancelled (usually because the user cancelled the job, or it ran over specified walltime)
0:53	Some file or directory referenced in the script was not readable or writable
0:125	Job ran out of memory
Anything else	Contact arcc-help@uwyo.edu to have us investigate

** you can also run sacct --help to get a comprehensive list of flags available to run with the sacct command

SINFO: Get information about cluster nodes and partitions

The default sinfo command: This print a list of all partitions, their states, availability, and associated nodes on the cluster

Helpful flags when calling `sinfo` to tailor your query

Flag	Used this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output

Flag	Used this when	Short Form	Short Form Ex.	Long Form	Useful flag info, Long Form Example & Output
state	Shows any nodes in state(s) specified	`-t`	`sinfo -t reserved`	`--states`	The `--states` flag, will print the sinfo, listing nodes (if any) in the specified state and the number of nodes from each partition in the state. If none in a partition are in the state, the number of nodes will be 0 for that partition’s line.
format	To get sinfo printout with specified format & output	`-O`	`sinfo -O NodeAddr,AllocatedMem,Cores`	`--Format`	If appended with the `--Format` flag, `sinfo` info is given using specified format & output. Format should be indicated using column names recognized by SLURM (hint: run `sinfo --helpFormat` to get a list of SLURM’s recognized column names)

SEFF: Analyze the efficiency of a completed job with `seff`

Below will just provide a short breakdown for using the seff command. Please see this page for a great and detailed description of how one could evaluate their job’s performance and efficiency.

The seff command will provide information about cpu and memory efficiency of your job, when provided a valid job number as the argument with seff <job#>. This information is only accurate assuming the job has completed successfully. Any jobs that are still running, or that complete with an out-of-memory error or other errors will have inaccurate seff output.

ARCCJOBS: Get a report of jobs currently running on the cluster

arccjobs shows a summary of jobs, cpu resources, and requested/used cpu time. It doesn't take any arguments or options.

ARCCQUOTA: Get a report of your common HPC data storage locations and usage

arccquota shows information relating to storage quotas. By default, this will display $HOME and $SCRATCH quotas first, followed by the user's associated project quotas. This is a change on Teton from Mount Moran, but the tool is much more comprehensive. The command takes arguments to do project-only (i.e., no $HOME or $SCRATCH info displayed), extensive listing of users' quotas and usage within project directories, can summarize quotas (i.e., no user-specific usage on project spaces).

ARCC Wiki

HPC System and Job Queries

Analytics

Overview: HPC Information and Compute Job Information

Common SLURM Commands

SQUEUE: Get information about running and queued jobs on the cluster with `squeue`

Helpful flags when calling `squeue` to tailor your query

SACCT: Get information about recent or completed jobs on the cluster with `sacct`

Helpful flags when calling `sacct` to tailor your query

My Job Failed. What Do these Exit Codes Mean?

SINFO: Get information about cluster nodes and partitions

Helpful flags when calling `sinfo` to tailor your query

SEFF: Analyze the efficiency of a completed job with `seff`

ARCCJOBS: Get a report of jobs currently running on the cluster

ARCCQUOTA: Get a report of your common HPC data storage locations and usage

Related content

HPC System and Job Queries

Overview: HPC Information and Compute Job Information

Common SLURM Commands

SQUEUE: Get information about running and queued jobs on the cluster with squeue

Helpful flags when calling squeue to tailor your query

SACCT: Get information about recent or completed jobs on the cluster with sacct

Helpful flags when calling sacct to tailor your query

My Job Failed. What Do these Exit Codes Mean?

SINFO: Get information about cluster nodes and partitions

Helpful flags when calling sinfo to tailor your query

SEFF: Analyze the efficiency of a completed job with seff

ARCCJOBS: Get a report of jobs currently running on the cluster

ARCCQUOTA: Get a report of your common HPC data storage locations and usage

Related content

SQUEUE: Get information about running and queued jobs on the cluster with `squeue`

Helpful flags when calling `squeue` to tailor your query

SACCT: Get information about recent or completed jobs on the cluster with `sacct`

Helpful flags when calling `sacct` to tailor your query

Helpful flags when calling `sinfo` to tailor your query

SEFF: Analyze the efficiency of a completed job with `seff`