Goal: Introduction to Slurm and how to start interactive sessions, submit jobs and monitor.
...
Interactive Session: salloc
: workshop
Info |
---|
|
Code Block |
---|
[]$ salloc –A arccanetrain –t 1:00 --reservation=<reservation-name> |
...
Code Block |
---|
[]$ salloc -A arccanetrain -t 1:00 --reservation=<reservation-name>
salloc: Granted job allocation 13526337
salloc: Nodes m233 are ready for job
# Make a note of the job id.
# Notice the server/node name has changed.
[arcc-t05@m233 intro_to_hpc]$ squeue -u arcc-t05
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
13526337 moran interact arcc-t05 R 0:19 1 m233
# For an interactive session: Name = interact
# You have the command-line interactively available to you.
[]$
...
[]$ squeue -u arcc-t05
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
13526337 moran interact arcc-t05 R 1:03 1 m233
# Session will automatically time out
[]$ salloc: Job 13526337 has exceeded its time limit and its allocation has been revoked.
slurmstepd: error: *** STEP 13526337.interactive ON m233 CANCELLED AT 2024-03-22T09:36:53 DUE TO TIME LIMIT ***
exit
srun: Job step aborted: Waiting up to 32 seconds for job step to finish. |
...
You submit a job to the queue and walk away.
Monitor its progress/state using command-line and/or email notifications.
Once complete, come back and analyze results.
...
Submit Jobs: sbatch
: Template
...