Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Flag

Use this when

Short Form

Short Form Ex.

Long Form

Useful flag info, Long Form Example & Output

job

To get info about specific job#(s)

-j

sacct -j 1000013

--jobs

Expand
titleExpand to see an example of running sacct with --jobs flag
Code Block
[user05@mblog1 ~] sacct --jobs=100013,100025

JobID           JobName  Partition    Account  AllocCPUS      State      ExitCode 
------------ ---------- ----------    ---------- ---------- ----------   -------- 
1000013      sys/dashb+         mb     mlproject     4        TIMEOUT      0:0 
1000013.bat+      batch                mlproject     4      CANCELLED     0:15 
1000013.ext+     extern                mlproject     4      COMPLETED      0:0 
1000025      sys/dashb+         mb     mlproject     8        RUNNING      0:0 
1000025.bat+      batch                mlproject     8        RUNNING      0:0 
1000025.ext+     extern                mlproject     8        RUNNING      0:0 

batch script

To view batch / submission script for a specific job

-B

sacct -j 1000101 -B

--batch-script

You must specify a job with the --jobs or -j flag to use the -B or --batch-script flag and see it’s associated batch / submission script. This will not work on interactive jobs run from an salloc command, or jobs that were not called from a script.

Expand
titleExpand to see an example of running sacct with --batch-script flag and output
Code Block
[user05@mblog1 ~] sacct -j 1000101 --batch-script
Batch Script for 1000101
---------------------------------------------------------------------
#!/bin/bash
#SBATCH --account=extrememl
#SBATCH --time=1:00:00
#SBATCH --mail-user=johnsmith@uwyo.edu
#SBATCH --mail-type=all

# Clear out and then load necessary software
module purge
module load gcc/14.2.0 r/4.4.0

# Browse to my project folder
cd /project/myprojdir/johnsmith/scripts/

# Export useful connection variables
export $HOSTNAME

# Run my code
R myscript.R 

user

To get a printout of a specific user’s jobs

-u

sacct -u joeblow

--user

The --user or -u flag, (shown in the expandable example below specifying a username), prints squeue info, specifically about jobs submitted by a specified user:

Expand
titleExpand to see an example of squeue command run with --user flag, and output
Code Block
[joeblow@mblog1 ~]$ sacct --user=joeblow
JobID     JobName Partition  Account   AllocCPUs State   ExitCode
-------   ------- ---------  --------- --------- ------- --------
1000002   AIML-CE   mb       extremeai        4  RUNNING      0:0
1000005   AIML-CE   mb       extremeai        4  RUNNING      0:0

start

To get a printout of job(s) starting after a date/time

-S

sacct -S 2024-11-01

--start

Dates and times should be specified with format YYYY-MM-DD-HH:MM

Expand
titleExpand to see an example of running sacct with --start and output
Code Block
[user05@mblog1 ~] sacct --start=2024-11-01

JobID           JobName  Partition    Account  AllocCPUS      State      ExitCode 
------------ ---------- ----------    ---------- ---------- ----------   -------- 
1000013      sys/dashb+         mb     mlproject     4        TIMEOUT      0:0 
1000013.bat+      batch                mlproject     4      CANCELLED     0:15 
1000013.ext+     extern                mlproject     4      COMPLETED      0:0 
1000025      sys/dashb+         mb     mlproject     8        RUNNING      0:0 
1000025.bat+      batch                mlproject     8        RUNNING      0:0 
1000025.ext+     extern                mlproject     8        RUNNING      0:0 

end

To get a printout of job(s) ending before a given date/time

-E

sacct -E 2024-11-24:12:00:00

--end

Dates and times should be specified with format YYYY-MM-DD-HH:MM

Expand
titleExpand to see an example of running sacct with --start and --end flags and output
Code Block
[user05@mblog1 ~] sacct --start=2024-11-01 --end=2024-11-24

JobID           JobName  Partition    Account  AllocCPUS      State      ExitCode 
------------ ---------- ----------    ---------- ---------- ----------   -------- 
1000013      sys/dashb+         mb     mlproject     4        TIMEOUT      0:0 
1000013.bat+      batch                mlproject     4      CANCELLED     0:15 
1000013.ext+     extern                mlproject     4      COMPLETED      0:0 
1000025      sys/dashb+         mb     mlproject     8        RUNNING      0:0 
1000025.bat+      batch                mlproject     8        RUNNING      0:0 

1000025.ext+     extern                mlproject     8        RUNNING      0:0 

format

To get sacct printout with specified format & output

-O

sacct -O Account,JobID

--format

If appended with the --format flag, sacct info is given using specified format & output. Format should be indicated using column names recognized by SLURM (hint: run sacct --helpformat to get a list of SLURM’s recognized column names)

Expand
titleExpand to see an example of sacct command run with --format flag, and output
Code Block
[user17@mblog1 ~]$ sacct --Format="Account,JobID"
  ACCOUNT          JOBID
  ------------    -----------             
  deeplearnlab    1000062         
  deeplearnlab    1000091       
  deeplearnlab    1000099    

submit line

To view the submit command for a specified job

-o SubmitLine

sacct -o SubmitLine -j 1000101

--format=SubmitLine

This is a way of using the --format flag from above to see a print out of the command your entered to submit the specified job after the -j flag.

Expand
titleExpand to see an example of running this command, and example output
Code Block
[user11@mblog1 ~]$ sacct --format=SubmitLine -j 1000324
          SubmitLine 
-------------------- 
  sbatch main_job.sh 

WorkDir

To view the working directory used by the job to execute commands

-o WorkDir

sacct -o WorkDir -j 1000101

--format=WorkDir

Expand
titleExpand to see an example of running this command, and example output
Code Block
[user11@mblog1 ~]$ sacct --format=WorkDir -j 1000324
          WorkingDir 
-------------------- 
  /project/deeplearnlab/ 

My Job Failed. What Do these Exit Codes Mean?

Slurm records error codes in the form of numerical values that seem rather cryptic. While we don’t always know for sure why they’re caused without investigation, some causes are more likely than others. Exit codes usually consist of 2 sets of numbers (one before a colon and one after) or a single number. Common error codes and their likely causes are below:

Exit Code

Likely Cause

0

The job ran successfully

Any non-zero value

The job failed in some form or another

1

A general failure

2

Something was wrong with a shell command in the script

3 and above

Job error associated with software commands (check software specific exit codes)

0:9

The job was cancelled (usually the user or Slurm/System)

0:15

The job was cancelled (usually because the user cancelled the job, or it ran over specified walltime)

0:53

Some file or directory referenced in the script was not readable or writable

0:125

Job ran out of memory

Anything else

Contact arcc-help@uwyo.edu to have us investigate

** you can also runsacct --helpto get a comprehensive list of flags available to run with the sacct command

...