Goal: Introduction to Slurm and how to start interactive sessions, submit jobs and monitor.
...
Code Block |
---|
[]$ salloc –A arccanetrain –t 1:00 --reservation=<reservation-name> |
...
Interactive Session:
...
squeue
: What’s happening?
Code Block |
---|
[]$ salloc -A arccanetrain -t 1:00 --reservation=<reservation-name> salloc: Granted job allocation 13526337 salloc: Nodes m233 are ready for job # Make a note of the job id. # Notice the server/node name has changed. [arcc-t05@m233 intro_to_hpc]$ squeue -u arcc-t05 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13526337 moran interact arcc-t05 R 0:19 1 m233 # For an interactive session: Name = interact # You have the command-line interactively available to you. []$ ... []$ squeue -u arcc-t05 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 13526337 moran interact arcc-t05 R 1:03 1 m233 # Session will automatically time out []$ salloc: Job 13526337 has exceeded its time limit and its allocation has been revoked. slurmstepd: error: *** STEP 13526337.interactive ON m233 CANCELLED AT 2024-03-22T09:36:53 DUE TO TIME LIMIT *** exit srun: Job step aborted: Waiting up to 32 seconds for job step to finish. |
...
You submit a job to the queue and walk away.
Monitor its progress/state using command-line and/or email notifications.
Once complete, come back and analyze results.
...
Submit Jobs: sbatch
: Template
...