Using PyTorch on Beartooth

ARCC is aware that the exact details and versions presented here are out-of-date, but the general process is still valid.

We will endeavor to update this page as soon as we can.

Overview

PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab. It is free and open-source software released under the Modified BSD license

PyTorch is a Python package that provides two high-level features:

Tensor computation (like NumPy) with strong GPU acceleration
Deep neural networks built on a tape-based autograd system

Using PyTorch with GPU on Teton

Here we will describe setting up a Conda environment, with pytorch installed, that allows you to run related source code and that utilizes GPUs.

The basic environment will:

Step though creating a basic Conda environment.
Provide a template for a bash script to submit jobs using sbatch.
Provide a very simple script that tests PyTorch has been imported and can identify the allocated GPU.

Note:

This is a short page and assumes some familiarization with using Conda. The “Package and Dependency Management with Conda” can be found on ARCC’s Training/Consultation page.
The installation of pytorch within the conda environment will also install related dependencies, but nothing else. Since you’re creating the conda environment, you can extend and install other packages. You can view the conda packages installed using conda list while in an active environment.
The bash script only uses a single node and single core. It is up to the user to explore other configurations.
In the scripts and examples below, please remember to appropriately edit to use your account, email address, folder locations etc.

Creating the Conda Environment

Setup the basic Conda environment to run with python version 3.8:

cd /project/arcc/salexan5/conda/gpu/pytorch
module load miniconda3/4.3.30
conda create -p pytorch_env python=3.8

There are a number of conda options on how/where to install an environment. In this case -p with create an environment called pytorch_env in the folder you’re running the command from. Once setup, make a note of the installation message that indicates how to activate your environment when you want to use it.

# To activate this environment, use:
# > source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/pytorch/pytorch_env
#
# To deactivate an active environment, use:
# > source deactivate

Activate your environment, and install the pytorch related package. Once installation has finished, deactivate your environment.

source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/pytorch/pytorch_env
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
source deactivate

Bash Script to use with sbatch

Below is a basic template to use that you’ll need to insert your account and email details into.

Simple Source Code Example

Below is some very simple source code that will test your environment and GPU request is functioning properly.

It simply imports the tensor package, and then using this checks that it can identify the allocated GPU(s). To work with the bash script above, save this file as pytorch_test.py

Requesting GPUs and Testing

We have a variety of GPUs on Teton, and depending on which you require, you'll need to adjust your bash script. The reason for the srun nvidia-smi -L within the bash script it that it will write out confirming the GPU configuration you’ve requested.

Below demonstrates the bash options for each GPU, as well as what you’d see from running the nvidia-smi -L command and source code from the bash script:

Depending on what you need, and available resources, you can also request multiple GPUs

PyTorch + k20/k40 GPUs

The version of PyTorch (1.6) installed is not working on some of our earlier GPUs.

If you need to use these GPUs then look to installing an older version of PyTorch, and/or contact ARCC and we can assist.

Running/Testing in an Interactive Session

If you’re just exploring, trying things out, and/or performing tests, then you can just as straight-forwardly use an interactive session. Below is an example of using salloc. Notice the steps are the same as if running a bash script via sbatch.

Once logged onto one of the login nodes, request an interactive session:

Load the modules you require. Since we’re using GPUs we need to load the appropriate NVidia drivers.

If you want, you can check the requested GPU has been allocated.

Because we are using a Conda environment, we need to activate it.

Navigate to the folder containing the source code and then run it

Once finished, deactivate the Conda environment and cancel you interactive session.

GPU Not Found/Detected

Remember to prefix the line where you call your application/program with srun. This actually releases the GPU allocation you requested.

If you forget, then you’ll see a warning like the following:

ARCC Wiki