Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

This page highlights the interplay between using Python and Miniconda on the Beartooth cluster, and how pip/conda install work alongside each other.

This page assumes:

  • You already know how to program using Python and have used pip install to install packages before.

  • You have already used miniconda to create/activate environments and have used conda install.

Since this is an interplay, please try and understand the indivdual parts and then how they do/don’t work together,

Where and Which Versions of Python are Available?

On the System:

 Python on the System
# Login Node: But you DO NOT run computation on the login nodes!
[salexan5@blog2 ~]$ python --version
Python 3.8.12

# Compute Node
[salexan5@blog2 ~]$ salloc -A arcc -t 5:00
[salexan5@mtest2 ~]$ python --version
Python 3.8.12

Via Modules:

 Python Modules
[salexan5@blog2 ~]$ module spider python
----------------------------------------------------------------------------------------------------------------------------------------------------------
  python:
----------------------------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        python/2.7.18
        python/3.10.6

Check for latest versions - request if you require a newer version.

A particular version might have some common packages pre-installed - load and check - Use pip list/show to find available packages

 Check basic environment configuration:
[salexan5@blog2 ~]$ module load gcc/12.2.0 python/3.10.6
[salexan5@blog2 ~]$ python --version
Python 3.10.6

[salexan5@blog2 ~]$ python -m site
sys.path = [
    '/pfs/tc1/home/salexan5',
    '/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/lib/python310.zip',
    '/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/lib/python3.10',
    '/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/lib/python3.10/lib-dynload',
    '/home/salexan5/.local/lib/python3.10/site-packages',
    '/apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/lib/python3.10/site-packages',
]
USER_BASE: '/home/salexan5/.local' (exists)
USER_SITE: '/home/salexan5/.local/lib/python3.10/site-packages' (exists)
ENABLE_USER_SITE: True
 Which packages are available?
[salexan5@blog2 ~]$ pip list
Package                  Version
------------------------ -----------
absl-py                  1.4.0
aiohttp                  3.8.4
aiosignal                1.3.1
...
wheel                    0.37.1
xxhash                   3.2.0
yarl                     1.8.2
 Where is a package installed?
# Local Home
[salexan5@blog2 ~]$ pip show torch
Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /pfs/tc1/home/salexan5/.local/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, nvidia-cublas-cu11, nvidia-cuda-cupti-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cufft-cu11, nvidia-curand-cu11, nvidia-cusolver-cu11, nvidia-cusparse-cu11, nvidia-nccl-cu11, nvidia-nvtx-cu11, sympy, triton, typing-extensions
Required-by: lightning-transformers, pytorch-lightning, torchmetrics, triton

# System Wide
[salexan5@blog2 ~]$ pip show matplotlib
Name: matplotlib
Version: 3.6.2
Summary: Python plotting package
Home-page: https://matplotlib.org
Author: John D. Hunter, Michael Droettboom
Author-email: matplotlib-users@python.org
License: PSF
Location: /apps/u/spack/gcc/12.2.0/python/3.10.6-7ginwsd/lib/python3.10/site-packages
Requires: contourpy, cycler, fonttools, kiwisolver, numpy, packaging, pillow, pyparsing, python-dateutil
Required-by:

Notice: The version of python being used dictates where a user installed package is installed:

 Where are Packages Installed?
[salexan5@blog2 lib]$ pwd
/home/salexan5/.local/lib

[salexan5@blog2 lib]$ ls
python2.7  python3.10  python3.6  python3.8  python3.9

[salexan5@blog2 lib]$ ls python3.10/site-packages/
absl                                google                                     nvidia_cuda_nvrtc_cu11-11.7.99.dist-info    responses-0.18.0.dist-info
...

[salexan5@blog2 ~]$ python --version
Python 3.10.6

# Notice how this ties in with:
[salexan5@blog2 ~]$ python -m site
sys.path = [
    ...
    '/home/salexan5/.local/lib/python3.10/site-packages',
    ...
]

So, if you have python 3.9.x activate within your environment, any pip install packages would be located under: /home/salexan5/.local/lib/python3.9/site-packages

Why does this matter?

If you have previously been using python 3.9.x, and then switch to python 3.10.x, any previously installed packages won’t be available and have to be re-installed for the newer version.

Via Miniconda:

Remember conda/miniconda is a “Package, dependency and environment management for any language - Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, Fortran, and more.“ - notice it allows an environment to be created for many more langauges than just Python, and installs system level libraries and not just python packages.

The versions of miniconda on the cluster do ship with their own version of Python:

 Miniconda Version and Python Version
[salexan5@blog2 ~]$ module load miniconda3/4.12.0
[salexan5@blog2 ~]$ python --version
Python 3.9.12

[salexan5@blog2 ~]$ module load miniconda3/23.1.0
The following have been reloaded with a version change:
  1) miniconda3/4.12.0 => miniconda3/23.1.0
[salexan5@blog1 499]$ python --version
Python 3.10.9

Why does this matter?

As previously indicated, if you pip install then the package will be installed in different folders as the python versions are different,.

This is why as a best practice you should always define the version of a module you’re loading.

Recommendation: Do not rely on the version of Python shipped with miniconda as the way to define which version of python you actual require.

Why? Because users typically do not define the miniconda version, and do not track the version they are using.

Suggestion: Simply use miniconda to manage your environment, and explicitly define the required version of python to use within it.

Using conda install in an Active Conda Environment

The conda install command "accepts a list of package specifications (e.g, bitarray=0.8) and installs a set of packages consistent with those specifications and compatible with the underlying environment." These are installed within your active conda environment, in the folder where it has been created.

Pip Installing in an Active Conda Environment

If you use pip install within an active conda environment then it will by default install these packages within your home folder, under the site-packages folder associated with the version of python being used within your conda environment. As described above.

Is this an issue?

Potentially yes.

  • It will work for you (the user) but probably not if you try and share the environment with others who do not have permission to access your home.

  • If you install a newer version of a package, then this will typically be used over any previous versions, does this introduce an error into your environment? Maybe…

  • How are you explicitly tracking the version of a python package?

As a best practice, we would suggest forcing the pip install to locate the package installations into your conda environment folder location and not you home folder.

Previously, the simplest way to do this is to set the following environment variable:

export PYTHONUSERBASE=intentionally-disabled

The PYTHONUSERBASE environment variable "Defines the user base directory, which is used to compute the path of the user site-packages directory“: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUSERBASE Essentially we are forcing conda to pip install within it’s folder.

Or, you could define this location as:

export PYTHONUSERBASE=<absolute-path-to-conda-env>

There are further tricks you can do as described here: https://stackoverflow.com/questions/62352699/conda-uses-local-packages

For example using the conda environment variable: CONDA_PREFIX that "The path to the conda environment used to build the package, such as /path/to/conda/env. Useful to pass as the environment prefix parameter to various conda tools, usually labeled -p or --prefix.“

The suggested approach is to use:

export PYTHONUSERBASE=$CONDA_PREFIX

Issue

This appears to not work if the created conda environment does not explicitly define it’s own version of python i.e. it’s using the version that comes with miniconda3.

Performing a pip install places the python packages under /apps/u/opt/linux/miniconda3/x.y.z/lib/python3.xx/site-packages/

Overall Suggestion:

If you’re performing local development, and not sharing, use a module load python/x.y.z - track the version of python and pip install into your local home.

If you’re using a conda environment and want to share this environment:

  1. Create the conda environment - within your project space.

  2. Explicitly define the version of python to use - and NOT rely on the version shipped with miniconda.

  3. Set the PYTHONUSERBASE environment variable so pip install packages are installed within the conda folder and not your home folder.

  • No labels