Using Dadi

Overview

Dadi is a powerful software tool for simulating the joint frequency spectrum (FS) of genetic variation among multiple populations and employing the FS for population-genetic inference.

An important aspect of dadi is its flexibility, particularly in model specification, but with that flexibility comes some complexity. dadi is not a GUI program, nor can dadi be run usefully with a single command at the command-line; using dadi requires at least rudimentary Python scripting. Luckily for us, Python is a beautiful and simple language. Together with a few examples, this manual will quickly get you productive with dadi even if you have no prior Python experience.

Using Dadi with GPU on Teton

Here we will describe setting up a Conda environment, with dadi installed, that allows you to run related source code and that utilizes GPUs.

The basic environment will:

Step though creating a basic Conda environment. Dadi uses PyCuda and scikit-cuda as packages to assist with interfacing with GPUs.
Provide a template for a bash script to submit jobs using sbatch.
Provide a very simple script that tests dadi has been imported and can identify the allocated GPU.

Note:

This is a short page and assumes some familiarization with using Conda. The “Package and Dependency Management with Conda” can be found on ARCC’s Training/Consultation page.
The installation of dadi within the conda environment will also install related dependencies, but nothing else. Since you’re creating the conda environment, you can extend and install other packages. You can view the conda packages installed using conda list while in an active environment.
The bash script only uses a single node and single core. It is up to the user to explore other configurations.
In the scripts and examples below, please remember to appropriately edit to use your account, email address, folder locations etc.

Creating the Conda Environment

Setup the basic Conda environment to run with python version 3.8:

cd /project/arcc/salexan5/conda/gpu/dadi
module load miniconda3/4.3.30
conda create --prefix=dadi_env python=3.8

There are a number of conda options on how/where to install an environment. In this case -p with create an environment called dadi_env in the folder you’re running the command from. Once setup, make a note of the installation message that indicates how to activate your environment when you want to use it.

# To activate this environment, use:
# > source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env
#
# To deactivate an active environment, use:
# > source deactivate

Activate your environment, and install the dadi related package. Once installation has finished, deactivate your environment.

source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env
conda install -c conda-forge dadi
conda install numpy, scipy, matplotlib, ipython
conda install numpy scipy matplotlib ipython
python3 -m pip install pycuda
python3 -m pip install scikit-cuda
source deactivate

Bash Script to use with sbatch

Below is a basic template to use that you’ll need to insert your account and email details into.

#!/bin/bash
#SBATCH --account=<your_arcc_project>
#SBATCH --time=0:10:00
#SBATCH --job-name=dadi_test
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your_email>
#SBATCH --output=dadi_%A.log
#SBATCH --mem=8G
#SBATCH --partition=moran-bigmem-gpu
#SBATCH --gres=gpu:1

echo "Load Modules:"
module load swset/2018.05
module load cuda/10.1.243
module load miniconda3/4.3.30

echo "Check GPU Allocation:"
echo "CUDA Visibale Devices:" $CUDA_VISIBLE_DEVICES
echo "Running nvidia-smi:"
srun nvidia-smi -L
nvcc --version

echo "Activate Conda Environment"
source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env

python --version

echo "- - - - - - - - - - - - - - - - - - - - -"
srun python dadi_test.py
echo "- - - - - - - - - - - - - - - - - - - - -"

echo "Deactivate Conda:"
source deactivate

echo "Done"

Simple Source Code Example

Below is some very simple source code that will test your environment and GPU request is functioning properly.

It simply imports the tensor package, and then using this checks that it can identify the allocated GPU(s). To work with the bash script above, save this file as dadi_test.py

import dadi

print("Cuda Enables: " + str(dadi.cuda_enabled(True)))

Requesting GPUs and Testing

We have a variety of GPUs on Teton, and depending on which you require, you'll need to adjust you bash script. The reason for the nvidia-smi -L within the bash script it that it will write out confirming the GPU configuration you’ve requested.

Below demonstrates the bash options for each GPU, as well as what you’d see from running the nvidia-smi -L command and source code from the bash script:

To request a k20 GPU on one of the moran nodes, use:

SBATCH --gres=gpu:1

Running nvidia-smi:
GPU 0: Tesla K20m (UUID: GPU-6b95c19a-916e-f488-d5e0-1d87f752ffe6)

Cuda Enabled: True
/pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env/lib/python3.8/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')

For other GPUs you’ll have to also specify the partition.

#SBATCH --partition=moran-bigmem-gpu
#SBATCH --gres=gpu:k80:1

Running nvidia-smi:
GPU 0: Tesla K80 (UUID: GPU-53acbde2-ec88-e8fa-d477-719e700fb22f))

Cuda Enabled: True
...

#SBATCH --partition=teton-gpu
#SBATCH --gres=gpu:p100:1

Running nvidia-smi:
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-8884193d-ab04-d5fc-bf5d-f921f264512a)

Cuda Enabled: True
...

Running/Testing in an Interactive Session

If you’re just exploring, trying things out, and/or performing tests, then you can just as straight-forwardly use an interactive session. Below is an example of using salloc. Notice the steps are the same as if running a bash script via sbatch.

Once logged onto one of the login nodes, request an interactive session:

[salexan5@tlog2 ~]$ salloc --account=arcc --time=01:00:00 -N 1 -c 1 --partition=moran-bigmem-gpu --gres=gpu:k80:1
salloc: Granted job allocation 10044335

Load the modules you require. Since we’re using GPUs re need to load the appropriate NVidia drivers.

[salexan5@mbm01 ~]$ module load miniconda3/4.3.30
[salexan5@mbm01 ~]$ module load cuda/10.1.243

If you want, you can check the requested GPU has been allocated.

[salexan5@mbm01 ~]$ srun nvidia-smi -L
GPU 0: Tesla K80 (UUID: GPU-53acbde2-ec88-e8fa-d477-719e700fb22f)

Because we are using a Conda environment, we need to activate it.

[salexan5@mbm01 ~]$ source activate /pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env

Navigate to the folder containing the source code and then run it

(/pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env) [salexan5@mbm01 ~]$ cd /project/arcc/salexan5/conda/gpu/dadi

(/pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env) [salexan5@mbm01 dadi]$ srun python dadi_test.py
Cuda Enabled: True
/pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env/lib/python3.8/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')

Once finished, deactivate the Conda environment and cancel you interactive session.

(/pfs/tsfs1/project/arcc/salexan5/conda/gpu/dadi/dadi_env) [salexan5@mbm01 dadi]$ source deactivate

[salexan5@mbm01 dadi]$ scancel 10044335
salloc: Job allocation 10044335 has been revoked.
[salexan5@mbm01 dadi]$ srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: mbm01: task 0: Killed
srun: Terminating job step 10044335.0

[salexan5@tlog1 ~]

GPU Not Found/Detected

Remember to prefix the line where you call your application/program with srun. This actually releases the GPU allocation you requested.

If you forget, then you’ll see a warning like the following:

python dadi_test.py

Failed to import dadi.cuda
Cuda Enabled: False