...
Flag | Use this when | Short Form | Short Form Ex. | Long Form | Useful flag info, Long Form Example & Output |
---|
job | To get info about specific job#(s) | -j
| sacct -j 1000013
| --jobs
| Expand |
---|
title | Expand to see an example of running sacct with --jobs flag |
---|
| Code Block |
---|
[user05@mblog1 ~] sacct --jobs=100013,100025
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1000013 sys/dashb+ mb mlproject 4 TIMEOUT 0:0
1000013.bat+ batch mlproject 4 CANCELLED 0:15
1000013.ext+ extern mlproject 4 COMPLETED 0:0
1000025 sys/dashb+ mb mlproject 8 RUNNING 0:0
1000025.bat+ batch mlproject 8 RUNNING 0:0
1000025.ext+ extern mlproject 8 RUNNING 0:0 |
|
|
batch script | To view batch / submission script for a specific job | -B
| sacct -j 1000101 -B
| --batch-script
| You must specify a job with the --jobs or -j flag to use the -B or --batch-script flag and see it’s associated batch / submission script. This will not work on interactive jobs run from an salloc command, or jobs that were not called from a script. Expand |
---|
title | Expand to see an example of running sacct with --batch-script flag and output |
---|
| Code Block |
---|
[user05@mblog1 ~] sacct -j 1000101 --batch-script
Batch Script for 1000101
---------------------------------------------------------------------
#!/bin/bash
#SBATCH --account=extrememl
#SBATCH --time=1:00:00
#SBATCH --mail-user=johnsmith@uwyo.edu
#SBATCH --mail-type=all
# Clear out and then load necessary software
module purge
module load gcc/14.2.0 r/4.4.0
# Browse to my project folder
cd /project/myprojdir/johnsmith/scripts/
# Export useful connection variables
export $HOSTNAME
# Run my code
R myscript.R |
|
|
user | To get a printout of a specific user’s jobs | -u
| sacct -u joeblow
| --user
| The --user or -u flag, (shown in the expandable example below specifying a username), prints squeue info, specifically about jobs submitted by a specified user: Expand |
---|
title | Expand to see an example of squeue command run with --user flag, and output |
---|
| Code Block |
---|
[joeblow@mblog1 ~]$ sacct --user=joeblow
JobID JobName Partition Account AllocCPUs State ExitCode
------- ------- --------- --------- --------- ------- --------
1000002 AIML-CE mb extremeai 4 RUNNING 0:0
1000005 AIML-CE mb extremeai 4 RUNNING 0:0 |
|
|
start | To get a printout of job(s) starting after a date/time | -S
| sacct -S 2024-11-01
| --start
| Dates and times should be specified with format YYYY-MM-DD-HH:MM Expand |
---|
title | Expand to see an example of running sacct with --start and output |
---|
| Code Block |
---|
[user05@mblog1 ~] sacct --start=2024-11-01
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1000013 sys/dashb+ mb mlproject 4 TIMEOUT 0:0
1000013.bat+ batch mlproject 4 CANCELLED 0:15
1000013.ext+ extern mlproject 4 COMPLETED 0:0
1000025 sys/dashb+ mb mlproject 8 RUNNING 0:0
1000025.bat+ batch mlproject 8 RUNNING 0:0
1000025.ext+ extern mlproject 8 RUNNING 0:0 |
|
|
end | To get a printout of job(s) ending before a given date/time | -E
| sacct -E 2024-11-24:12:00:00
| --end
| Dates and times should be specified with format YYYY-MM-DD-HH:MM Expand |
---|
title | Expand to see an example of running sacct with --start and --end flags and output |
---|
| Code Block |
---|
[user05@mblog1 ~] sacct --start=2024-11-01 --end=2024-11-24
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1000013 sys/dashb+ mb mlproject 4 TIMEOUT 0:0
1000013.bat+ batch mlproject 4 CANCELLED 0:15
1000013.ext+ extern mlproject 4 COMPLETED 0:0
1000025 sys/dashb+ mb mlproject 8 RUNNING 0:0
1000025.bat+ batch mlproject 8 RUNNING 0:0
1000025.ext+ extern mlproject 8 RUNNING 0:0 |
|
|
format | To get sacct printout with specified format & output | -O
| sacct -O Account,JobID
| --format
| If appended with the --format flag, sacct info is given using specified format & output. Format should be indicated using column names recognized by SLURM (hint: run sacct --helpformat to get a list of SLURM’s recognized column names) Expand |
---|
title | Expand to see an example of sacct command run with --format flag, and output |
---|
| Code Block |
---|
[user17@mblog1 ~]$ sacct --Format="Account,JobID"
ACCOUNT JOBID
------------ -----------
deeplearnlab 1000062
deeplearnlab 1000091
deeplearnlab 1000099 |
|
|
submit line | To view the submit command for a specified job | -o SubmitLine
| sacct -o SubmitLine -j 1000101
| --format=SubmitLine
| This is a way of using the --format flag from above to see a print out of the command your entered to submit the specified job after the -j flag. Expand |
---|
title | Expand to see an example of running this command, and example output |
---|
| Code Block |
---|
[user11@mblog1 ~]$ sacct --format=SubmitLine -j 1000324
SubmitLine
--------------------
sbatch main_job.sh |
|
|
WorkDir | To view the working directory used by the job to execute commands | -o WorkDir
| sacct -o WorkDir -j 1000101
| --format=WorkDir
| Expand |
---|
title | Expand to see an example of running this command, and example output |
---|
| Code Block |
---|
[user11@mblog1 ~]$ sacct --format=WorkDir -j 1000324
WorkingDir
--------------------
/project/deeplearnlab/ |
|
|
My Job Failed. What Do these Exit Codes Mean?
Slurm records error codes in the form of numerical values that seem rather cryptic. While we don’t always know for sure why they’re caused without investigation, some causes are more likely than others. Exit codes usually consist of 2 sets of numbers (one before a colon and one after) or a single number. Common error codes and their likely causes are below:
Exit Code | Likely Cause |
---|
0 | The job ran successfully |
Any non-zero value | The job failed in some form or another |
1 | A general failure |
2 | Something was wrong with a shell command in the script |
3 and above | Job error associated with software commands (check software specific exit codes) |
0:9 | The job was cancelled (usually the user or Slurm/System) |
0:15 | The job was cancelled (usually because the user cancelled the job, or it ran over specified walltime) |
0:53 | Some file or directory referenced in the script was not readable or writable |
0:125 | Job ran out of memory |
Anything else | Contact arcc-help@uwyo.edu to have us investigate |
** you can also runsacct --help
to get a comprehensive list of flags available to run with the sacct command
...