Create a Shared TensorFlow Conda Environment
Goal: General process for creating and sharing a conda environment under a project.
Use Case
We want to create a software installation of TensorFlow (an end-to-end platform for machine learning) that can be shared across a project by all users.
From the documentation (as of time of writing):
The install documentation only details using a
pip install
.TensorFlow is tested and supported on the following 64-bit systems: Python 3.8–3.11
General Process
[]$ module purge
[]$ module load miniconda3/24.3.0
[]$ conda create -p 2.16 python=3.11
[]$ conda activate /cluster/medbow/project/<project-name>/software/tensorflow/2.16
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ export PYTHONUSERBASE=$CONDA_PREFIX
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ echo $PYTHONUSERBASE
/cluster/medbow/project/<project-name>/software/tensorflow/2.16
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ pip install tensorflow
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ python --version
Python 3.11.9
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ python ~/tf_test.py
...
TensorFlow Version: 2.16.1
tf.Tensor(997.42975, shape=(), dtype=float32)
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ conda deactivate
[]$
Notice Where Related Packages Were Installed
Since we set export PYTHONUSERBASE=$CONDA_PREFIX
there is nothing within the home folder under: .local/lib/
i.e. no Python3.11
folder:
[~]$ ls .local/lib/
python3.10 python3.12
Everything is self contained within the conda environment itself:
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ pwd
/project/arcc/software/tensorflow
(/cluster/medbow/project/<project-name>/software/tensorflow/2.16) []$ ls 2.16/lib/python3.11/site-packages/
absl grpc mdurl-0.1.2.dist-info pkg_resources tensorboard_data_server-0.7.2.dist-info
absl_py-2.1.0.dist-info grpcio-1.64.1.dist-info ml_dtypes protobuf-4.25.3.dist-info tensorflow
astunparse h5py ml_dtypes-0.3.2.dist-info __pycache__ tensorflow-2.16.1.dist-info
astunparse-1.6.3.dist-info h5py-3.11.0.dist-info namex pygments tensorflow_io_gcs_filesystem
certifi h5py.libs namex-0.0.8.dist-info pygments-2.18.0.dist-info tensorflow_io_gcs_filesystem-0.37.0.dist-info
...
How is this Shared?
Since the environment was created under a project, any one who is part of this project can also access/activate this environment.
[]$ pwd
/project/arcc/software/tensorflow
[$ ls -al
...
drwxr-sr-x 2 <username> <project-name> 4096 Jun 24 16:33 2.16
Assume the user user02
is part of the <project-name>
project.
[user02@mblog1 ~]$ module load miniconda3/24.3.0
[user02@mblog1 ~]$ conda activate /project/<project-name>/software/tensorflow/2.16/
(/project/<project-name>/software/tensorflow/2.16) [user02@mblog2 ~]$ python -c "import tensorflow as tf; print(\"TensorFlow Version: \" + str( tf.__version__))"
...
TensorFlow Version: 2.16.1
(/project/<project-name>/software/tensorflow/2.16) [user02@mblog2 ~]$ conda deactivate
[user02@mblog1 ~]$
Since any one who is part of this project can also access/activate this environment, they can update and modify it.
There changes will affect everyone else!
Exercise: TensorFlow CPU vs GPU
Note: The example above only utilizes CPUs.
Reading the documentation notice the variation to the pip install call:
pip install tensorflow[and-cuda]
This will additionally install any required nvidia/cuda related libraries.
Exercise:
Create a GPU enabled TensorFlow Conda environment.
Start an interative session that requests one GPU.
Run the
tf_test_gpu.py
python script,
Expected Output:
...
TensorFlow Version: 2.17.0
Num GPUs Available: 1
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
The version might be different if a newer version has been made available since this example was created.