Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel7
typeflat
separatorpipe

Overview

The Slurm Workload Manager is a powerful and flexible workload manager used to schedule jobs on high performance computing (HPV) clusters. The Slurm Workload Manager is can be used to schedule jobs, control resource access, provide fairshare, implement preemption, and provide record keeping. All compute activity should be used from within a Slurm resource allocation (i.e., job). This page details the prerequites for using Slurm, software, and extensions need to run the program, configuration information and directions for updating Slurm.

...

ARCC utilizes Slurm is on Teton

...

and Loren.

...

Contents

Table of Contents

Glossary

Training

Tip

Slurm Video Tutorials

Tip

Slurm Cheat Sheet Abigail Smith (Unlicensed) *upload PDF

Compiling and Installing

Prerequisites

Red Hat / EPEL provided dependencies (use yum with appropriate repositories configured):

  1. GCC

  2. readline(-devel)

  3. MariaDB(-devel)

  4. Perl(-devel)

  5. lua(-devel)

  6. cURL(curl & libcurl(-devel))

  7. JSON (json-c(-devel))

  8. munge(-devel)

Code Block
languagec
[root@tmgt1 ~]# yum -y install \
  gcc \
  readline readline-devel \
  mariadb mariadb-devel \
  perl perl-devel \
  curl libcurl libcurl-devel \
  json-c json-c-devel \
  munge munge-devel munge-libs

Mellanox provided dependencies (Use mlnxofedinstall script)

  1. libibmad(-devel)

  2. libibumad(-devel)

ARCC Supplied RPM

  1. PMIx

  2. UCX (slurm not currently configured to use it)

PMIx

PMIx is used to exchange information about the communications and launching platforms of parallel applications (i.e., mpirun, srun, etc.). the PMIx implementation is a launcher that prefers to do communications in conjunction with the job scheduler rather than using older RSH/SSH methods. The communications time to starting applications can also be reduced significantly at high node counts compared to the older ring start-up or even the Hydra implementation. The version presently installed was built as an RPM and inserted into the images or installed via repo. The version in EPEL is too old.

Code Block
languagec
powerman $ rpmbuild ...

UCX

This is presently an RPM which is a fine method to go as well, but instructions for compiling a source build are below. This will be updated with instructions on building the RPM as well. The current version that's installed is NOT from EPEL. That version was too old.

Code Block
languagec
powerman $ wget https://github.com/openucx/ucx/releases/download/v1.6.1/ucx-1.6.1.tar.gz

powerman $ tar xf ucx-1.6.1.tar.gz

powerman $ cd ucx-1.6.1

powerman $ ./configure \
  --prefix=/apps/s/ucx/1.6.1 \
  --enable-devel-headers \
  --with-verbs \
  --with-rdmacm \
  --with-rc \
  --with-ud \
  --with-dc \
  --with-dm

powerman $ make -j8

powerman $ make install

RPM Build

Code Block
languagec
TODO

HDF5

Compile HDF5 from source and put it in a systems directory not to be confused with the user-accessible HDF5 installations that may have additional dependencies like Intel compilers or an MPI implementation.

Code Block
languagec
powerman $ cd /software/slurm

powerman $ tar xf hdf5-1.10.5.tar.bz2

powerman $ cd hdf5-1.10.5

powerman $ ./configure --prefix=/apps/s/hdf5/1.10.5

powerman $ make -j4

powerman $ make install

If you keep track of ABI compatibility (you're a sysadmin, you should), then you may want to make the link to the "latest" release of this in the parent of the installed directory as shown below.

Code Block
languagec
powerman $ cd /apps/s/hdf5

powerman $ ln -s 1.10.5 latest

hwloc

Use the ultra-stable version of hwloc and install it in a global location. It can be used by users if needed, but otherwise, you can have a separate installation for user hwloc library when needed. This is specifically to address cgrouping within the system and used by Slurm.

Code Block
languagec
powerman $ cd /software/slurm

powerman $ tar xf hwloc-1.11.13.tar.bz2

powerman $ cd hwloc-1.11.13

powerman $ ./configure --prefix=/apps/s/hwloc/1.11.13

powerman $ make -j4

powerman $ make install

If you keep track of ABI compatibility (you're a sysadmin, you should), then you may want to make the link to the "latest" release of this in the parent of the installed directory as shown below.

Code Block
languagec
powerman $ cd /apps/s/hwloc

powerman $ ln -s 1.11.13 latest

Slurm

Download the latest version that's available. Slurm is on a 9-month release cycle, but often has to build fixes for specific versions, CVEs for specific fixes, even potentially hotfixes that need to be addressed.

Assuming the downloaded version is '19.05.5.tar.bz2', instructions below. Also, reference the symbolic links for HDF5 and hwloc libraries as well.

Code Block
languagec
powerman $ cd /software/slurm

powerman $ tar xf slurm-19.05.5.tar.bz2

powerman $ cd slurm-19.05.5

powerman $ ./configure \
  --prefix=/apps/s/slurm/19.05.5 \
  --with-hdf5=/apps/s/hdf5/latest/bin/h5cc \
  --with-hwloc=/apps/s/hwloc/latest

powerman $ make -j8

powerman $ make install

Additional Features / Utilities

Code Block
languagec
powerman $ cd contribs
powerman $ make

powerman $ for i in lua openlava perlapi seff sjobexit torque;
do
  cd $i
  make install
  cd -
done

PAM libraries are special and if you want to use them, they go in a specific location that is node-local. Generally, /usr/lib64/security/. Consult your PAM manual pages to understand the implications of this and also your understanding of PAM.

Code Block
languagec
powerman $ for i in pam pam_slurm_adopt;
do
  cd $i
  echo $i
  cd -
done

NOTE: Honestly, the only one that should really be after if you find users abusing nodes is the pam_slurm_adopt code which can pull in users to their cgroups and not allow them to access a node if they don't have a job on it. Additionally, remember that configuring PAM is more than just installing libraries and that the PAM stack (/etc/pam.d/...) will need to be modified appropriately. An example will be posted an example at a later date.

Final Installation Process

Code Block
languagec
powerman $ unlink /apps/s/slurm/19.05

powerman $ ln -s /apps/s/slurm/19.05.5 /apps/s/slurm/19.05

powerman $ ln -s /apps/s/slurm/etc /apps/s/slurm/19.05/etc

powerman $ ln -s /apps/s/slurm/var /apps/s/slurm/19.05/var

First Installation & Configuration

Munge

The munge daemon is used to validate messages between slurm daemons to make sure that users are the correct users. Realistically, this only needs to be configured once at the start of the first installation of Slurm and then one used the same munge key on subsequent updates unless you're security paranoid. To generate a decent munge key, use the dd and either the /dev/random or /dev/urandom generators.

Code Block
languagec
root@tmgt1:~# dd if=/dev/random bs=1 count=1024 >/etc/munge/munge.key

root@tmgt1:~# chown munge:munge /etc/munge/munge.key

root@tmgt1:~# chmod 600 /etc/munge/munge.key

You then need to place the munge key and daemon on all servers that you intend to run slurm commands with.

  • tmgt1

  • tdb1

  • tlog1

  • tlog2

  • ALL compute nodes

MariaDB

To use proper accounting with fairshare and associations, Slurm requires the use of a database, a MySQL compatible like MariaDB. Therefore, MariaDB should be configured properly on the tdb1 node to take advantage of the SSD storage in the system. Configure the basics of a normal MariaDB (root account, basic setup, etc.)

Then you'll need to add the slurm database and apply the appropriate user and passwords for the slurmdbd to communicate with the MariaDB server. The communication should happen over the localhost in this case when you configure the credentials.

See the Slurm documentation for configuring this along with the slurmdbd.conf file.

Performing Upgrades

It's very important to Slurm that the upgrades happen in a certain order to make sure that continuous service is provided with backward-compatible communications schemes where clients talk to servers. Specifically, the ordering is as follows:

  1. Slurm database

  2. Slurm controller

  3. Slurm compute nodes

Updating the Slurm Database

The Slurm database is a critical component in the ARCC infrastructure and the concepts of keeping accounts and investors in line rely extensively on this database being active. Therefore, it's quite important to perform a back up of the database before attempting an upgrade. Use the normal MySQL/MariaDB backup capability to accomplish this. Also be aware that ARCC does not prune the database so far, but that may become an issue later if more high throughput computing is introduced.

Code Block
languagec
[root@tmgt1]# ssh tdb1

[root@tdb1]# systemctl stop slurmdbd.service

[root@tdb1]# ## PERFORM DB BACKUP ##

[root@tdb1]# install -m 0644 /software/slurm/slurm-19.05.5/etc/slurmdbd.service /etc/systemd/system/slurmdbd.service

[root@tdb1]# systemctl daemon-reload

[root@tdb1]# su - slurm

bash-4.2$ cd /tmp

bash-4.2$ /apps/s/slurm/19.05.4/sbin/slurmdbd -D -vvv

Wait for Slurm to resume normal operations and completely make the database changes necessary. Once the changes are done continue, Ctrl-C to interrupt the process:

Code Block
languagec
bash-4.2$ ^c

[root@tdb1]# systemctl start slurmdbd.service

[root@tdb1]# exit

Updating the Slurm Controller

Make sure the database has been properly updated prior to doing the controller.

Code Block
languagec
[root@tmgt1] systemctl stop slurmctld.service

[root@tmgt1] install -m 0644 /software/slurm/slurm-19.05.5/etc/slurmctld.service /etc/systemd/system/slurmctld.service

[root@tmgt1] systemctl daemon-reload

[root@tmgt1] systemctl start slurmctld.service

Updating the Slurm Compute Nodes

Running Nodes

If you don't want to reboot nodes because it would take to long or for some other reason, they can be updated live. This isn't a perfect method of doing things but should work.

Code Block
languagec
root@tmgt1:~# pscp -f 20 software/slurm/latest/etc/slurmd.service moran,teton:/etc/systemd/system/slurmd.service

root@tmgt1:~# psh -f 20 moran,teton systemctl stop slurmd.service

root@tmgt1:~# psh -f 20 moran,teton systemctl daemon-reload

root@tmgt1:~# psh -f 20 moran,teton systemctl start slurmd.service

root@tmgt1:~# psh -f 20 moran,teton "systemctl is-active slurmd.service" | xcoll

Compute Node Image

The image that is presently booted on using the RHEL image is t2018.08. The relevant part of Slurm to the compute node is the Systemd service file. The compute.postinstall script copies this file into the appropriate location provided the symbolic link in the /software/slurm directory is correct (i.e., latest -> 19.05.4). Then use the standard xCAT tools to generate the image (genimage) and the pack and compress the image (packimage) using the root user.

Code Block
languagec
powerman $ unlink /software/slurm/latest

powerman $ ln -s /software/slurm/slurm-19.05.5 /software/slurm/latest

powerman $ # Need to switch to root user

root@tmgt1:~# genimage t2019.08

root@tmgt1:~# packimage -c pigz -m tar t2019.08

Once this has been done, you can reboot one of the compute nodes for validation. However, make sure to upgrade the slurmdbd and slurmctld prior to doing these steps.

Slurm Validation

Version Checks

Check command versions ...

Code Block
languagec
$ sinfo --version

$ squeue --version

$ sbatch --version

$ scontrol --version

Controller & Database Checks

Make sure scontrol, sacctmgr, sacct, sreport, and job_submit.lua works appropriately...

PMI Checks

Make sure Intel launches appropriately...

Intel MPI attempts to use the PMI library when with Slurm which is good. New versions also support PMIx which provides interfaces that are _backwards_ compatible with the PMI1 and PMI2 interfaces.

Code Block
languagec
$ salloc --nodes=2 --account=arcc --time=20

$ echo $I_MPI_PMI_LIBRARY

$ srun ...

You shouldn't get any PKVS failures if working properly.

Special Nodes

DGX Systems

The DGX system used to be installed locally since it was only one Ubuntu system installed as an _appliance_. However, once a second DGX system was procured, the installation was moved to an alternative GPFS directory and a special symbolic link is maintained to the proper OS version on the Ubuntu systems vs the RHEL/CentOS systems.

The same installation process applies as above, but only need to focus on the slurmd installation piece. See the _/software/slurm/jj.slurm.dgx_ file for directions. If you run into permission issues, use the appropriate mkdir, chown, chgrp commands to fix the installation directory to the Powerman user.

Users will need to add the libswitch-perl package via apt to get the torque wrappers working properlySlurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained.

Please view the child pages for more details and examples on understanding, learning and using Slurm.