Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Overview

Effective January 13, 2025, changes were made to the Medicinebow job scheduler. These change are detailed here. These changes may result in your jobs not running the same way they did previously, ending in an error, or in queue for a longer period of time. Please reference the troubleshooting section below for issues that may occur with jobs after maintenance, typical error messages, and the more common solutions. In the event that this troubleshooting page is unhelpful, please don’t hesitate to contact arcc-help@uwyo.edu for assistance.

Troubleshooting Error Messages

sbatch/salloc: error: Interactive jobs cannot be longer than 8 hours

Post maintenance, interactive jobs are restricted to an 8 hour walltime. Please submit your salloc command with a walltime 8 hours or less.
Example:

salloc -A projectname -t 8:00:00

I can no longer request an OnDemand job more than 8 hours

To encourage users to use only the time they need, all interactive jobs, including those requested through OnDemand have been limited to 8 hours in length. Please specify a time from the OnDemand webform under 8 hours.

sbatch/salloc: error: Use of --mem=0 is not permitted. Consider using --exclusive instead

Users may no longer request all memory on a node using the --mem=0 flag. If you know you need the use of an entire node, replace your --mem=0 flag specification in your job with --exclusive to get use of an entire node an all it’s resources.

sbatch/salloc: error: QOSMinGRES

Users must specify a GPU device if requesting a GPU partition. Assuming you plan to use a GPU in your computations, please specify a GPU by including either the --gres=gpu:# or --gpus-per-node=# flag in your job submission.

sbatch/salloc: error: Job submit/allocate failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

This may occur for a number of reasons. Please e-mail arcc-help@uwyo.edu with the location of the batch script you’re attempting to run, or salloc command you’re attempting to run, and the error message you receive.

salloc: error: Interactive jobs must be run under the 'interactive' QOS or 'debug' QOS, not 'fast'

Users must specify the interactive or debug queue, or a time under 8 hrs when requesting an interactive job.

salloc: error: Job submit/allocate failed: Invalid qos specification

Users should specify times that match queues. i.e.,
Debug (<= 1 hr)
Interactive (<= 8 hrs)
Fast (< = 12 hrs)
Normal (<= 3 days)
Long (<= 7 days)

sbatch: error: Batch job submission failed: Requested node configuration is not available

This may occur for a number of reasons, but is likely due to the combination of nodes and hardware you’ve requested, and whether that hardware is available on the node/partition. If you need assistance please e-mail arcc-help@uwyo.edu with the location of the batch script you’re attempting to run, or salloc command you’re attempting to run, and the error message you receive.

My job has been sitting in queue for a very long time without running

This is usually the result of specified walltime. If you have specified a 7 day walltime in your job using --time or -t flag over 3 days, you will be placed in the “long” queue which may result in a longer wait time. If your job doesn’t require 7 days, please try specifying a shorter walltime (ideally under 3 days). This should result in your job being placed in a queue with a shorter wait time.

  • No labels