Goal: Provide an exercise to work through that puts together the various concepts covered within this workshop.
Exercise: Put Things Together
Exercise: Create a self contained Conda environment that would allow a user to run a python file that uses PyTorch.
The environment should be created within a location that can be shared with other users within a project.
Make sure the environment can utilize GPUs running with
cuda
12.4.
Using the the following
pytorch_gpu_test.py
script, create a bash script to submit a job to the cluster that allows this to be run, using the created Conda environment, that utilizes one H100 NVidia GPU device on a single compute node, with 8 cores and 16G of memory.Any output should be written into a
slurms
sub folder, with a filename of the formpytorch_<job-id>.out
Send email notification to yourself regards the status of the submission.
Ignoring any cached packages, how much storage space does this Conda environment use?
Which version of Python is being used?
Exercise: Extend with Pandas
Exercise: I want to also use the Python Pandas package, specifically version 2.1.4
to assist with some further analysis.
Can I extend my existing Conda environment?
If so how?
Should I?
Exercise: Suggested Answers: Create the Environment
Considerations:
Asking for a shared environment: Install within a
/project/<project-name/>
location - a project you and users you want to share with are part of.Read the documentation!
The installation process offers both conda and pip install directions. Since we want a self contained environment we would suggest using the
conda install
approach. If you usepip install
you would need to set thePYTHONUSERBASE
environment variable.Directions indicate how to install
cuda
version 12.4.
Look at the Linux
du
command to calculate the storage a folder takes.
Regards the storage space taken up:
[pytorch]$ du -d 1 -h 6.3G ./pytorch_env 6.3G
Notice within the installation of pytorch
the following (or something similar):
... python-3.12.3 |hab00c5b_0_cpython 30.5 MB conda-forge ...
You can confirm within the activate environment:
(/cluster/medbow/project/<project-name>/software/pytorch/pytorch_env) []$ python --version Python 3.12.3
Exercise: Suggested Answers: Run the Code
Considerations:
Decide on your working directory where you’ll have the python test script and the bash script to submit. Do you want to share this?
Look at the required resources:
One H100 - which partition do you need to define? Remember: if you don’t ask, you don’t get.
How do you request 8 cores and 16G of memory?
What are all the steps you need to perform? You can test via an interactive
salloc
session, and then copy into your bash submission script.
If successful, your output should take the following form:
Exercise: Suggested Answers: Extend with Pandas
Considerations:
Can you add Pandas to the environment? Yes. You can always go back to an existing environment, activate, and update.
How will you install this package? Using
conda install
orpip install
?You’re explicitly asked for version
2.1.4
. Is this version available? How can you check?If you
conda install
, is there anything I need to take note off during the solving stage?
Because we want to try and keep are environment self contained, we’d suggest first looking at using Conda.
Perform the following to check if it is available as a conda package:
(/cluster/medbow/project/<project-name>/software/pytorch/pytorch_env) []$ conda search pandas Loading channels: done # Name Version Build Channel ... pandas 2.1.4 py39hddac248_0 conda-forge ... pandas 2.2.2 py39hfc16268_1 conda-forge
Notice this package is within the conda-forge
channel. Do you have this configured?
What is details during the solving stage of the conda install
?
(/cluster/medbow/project/<project-name>/software/pytorch/pytorch_env) []$ conda install pandas==2.1.4 The following packages will be downloaded: package | build ---------------------------|----------------- numpy-1.26.4 | py312heda63a1_0 7.1 MB conda-forge pandas-2.1.4 | py312hfb8ada1_0 14.0 MB conda-forge ------------------------------------------------------------ Total: 21.1 MB The following NEW packages will be INSTALLED: pandas conda-forge/linux-64::pandas-2.1.4-py312hfb8ada1_0 ... The following packages will be DOWNGRADED: numpy 2.1.0-py312h1103770_0 --> 1.26.4-py312heda63a1_0
Notice: The numpy
package was installed as part of the original torch
install, but is going to be downgraded from 2.1.0
to 1.26.4
!
This is a reason to maybe not set the
always_yes
option in the~/.condarc
file. If you have this set toyes
, then the installation would have continued regardless.How does this downgraded version potentially affect torch? With out testing, we don’t know.
We would suggest not downgrading. Instead create a separate Conda environment for Pandas so we do not run into potential dependency issues.
Prev | Workshop Home | Next |