...
Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Common Issues and How to Resolve
...
Required: Account and Walltime
Info |
---|
Remember: By default you must define the project (account) you’re using and a walltime. |
Code Block |
---|
[salexan5@mblog2 ~]$ salloc salloc: error: You didn't specify a project account (-A,--account). Please open a ticket at arcc-help@uwyo.edu for help. salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified [salexan5@mblog2 ~]$ salloc -A arcc salloc: error: You didn't specify a walltime (-t, --time=) for the job. Please open a ticket at arcc-help@uwyo.edu for help. salloc: error: Job submit/allocate failed: Requested time limit is invalid (missing or exceeds some limit) [salexan5@mblog2 ~]$ salloc -t 10:00 salloc: error: You didn't specify a project account (-A,--account). Please open a ticket at arcc-help@uwyo.edu for help. salloc: error: Job submit/allocate failed: Invalid account or account/partition combination specified # The bare minimum: [salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 salloc: Granted job allocation 1250349 salloc: Nodes mbcpu-025 are ready for job |
...
Correct Partitions
Info |
---|
...
Walltime and TIMEOUT:
...
If you need to explicitly request a partition, the name must be correct: |
Code Block |
---|
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 --partition=mb-l40
salloc: error: invalid partition specified: mb-l40
salloc: error: Job submit/allocate failed: Invalid partition name specified |
Info |
---|
Use the |
Expand | ||
---|---|---|
| ||
|
Code Block |
---|
# Corrected:
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 --partition=mb-l40s
salloc: Pending job allocation 1250907
salloc: job 1250907 queued and waiting for resources
salloc: job 1250907 has been allocated resources
salloc: Granted job allocation 1250907
salloc: Nodes mbl40s-001 are ready for job |
...
Timeouts
Info |
---|
Timeouts aren’t errors as such, just that the time you requested was not long enough to compete the computation. |
Info |
---|
The maximum allowed wall time is 7 days: |
Code Block |
---|
[arcc-t01@mblog2 ~]$ salloc -A arccanetrain -t 7-00:00:01
salloc: error: Job submit/allocate failed: Requested time limit is invalid (missing or exceeds some limit)
[arcc-t01@mblog2 ~]$ salloc -A arccanetrain -t 7-00:00:00
salloc: Granted job allocation 1251651
salloc: Nodes mbcpu-010 are ready for job |
Note |
---|
Do not request 7 days just because you can! Wall time is considered when Slurm tries to allocate your job. A job is more likely to be back filled (slotted onto the cluster) in busy times than pending jobs will longer wall times. |
...
My Jobs Need to Run Longer than 7 Days
Info |
---|
ARCC can provide users with wall times longer than 7 days. But, we require that you can demonstrate that you job can not be optimized, for example:
ARCC can provide assistance with trying to understand if a job can be optimized. |
...
Requested node configuration is not available
Info |
---|
This is caused because you’re trying to request a configuration that isn’t available, or requires more details: For example: |
Too many cores on a node:
Code Block |
---|
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 -c 100
salloc: error: CPU count per node can not be satisfied
salloc: error: Job submit/allocate failed: Requested node configuration is not available |
Must define a GPU enabled partition:
Code Block |
---|
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 --gres=gpu:1
salloc: error: Job submit/allocate failed: Requested node configuration is not available
salloc: Job allocation 1253677 has been revoked.
[salexan5@mblog2 ~]$ salloc -A arcc -t 10:00 --gres=gpu:1 --partition=mb-a30
salloc: Granted job allocation 1253691
salloc: Nodes mba30-001 are ready for job |
...
OUT-OF-MEMORY: Segmentation Fault
Info |
---|
Segmentation faults are typically caused by an application trying to access memory outside what has been allocated to the job. Basically, you job is out of memory of what it requested. |
Info |
---|
Resolved: Request more memory using either the |
...
...