/
Setting Up Environments

Setting Up Environments

Goal: Understand how to load modules, reset your environment by purging, and potential dependency issues.


Setup Python environment

Let’s set up an environment to enable us to run Python scripts:

[]$ module purge []$ module load gcc/14.2.0 []$ module load python/3.12.0 []$ python --version

Can we combine the two module load commands into a single line?

What is the difference, what happens, when we try the following two methods:

[]$ module purge []$ module load python/3.12.0 gcc/14.2.0 vs []$ module purge []$ module load gcc/14.2.0 python/3.12.0

Order matters!

Dependencies must be loaded first, so must be listed first, from left to right.


What’s happened to the PATH environment variable?

What is happening to our environment when we’re loading modules?

[]$ module purge []$ echo $PATH /apps/s/arcc/1.0/bin:/apps/s/slurm/latest/bin: /home/<username>/.local/bin:/home/<username>/bin: /usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin []$ which python /usr/bin/python

Now lets load a version of Python and see what happens:

[]$ module load gcc/14.2.0 python/3.12.0 []$ echo $PATH /apps/u/spack/gcc/14.2.0/python/3.12.0-4e5he6r/bin: /apps/u/spack/gcc/14.2.0/util-linux-uuid/2.38.1-6f6zqay/bin: ... /apps/s/slurm/latest/bin: /home/<username>/.local/bin:/home/<username>/bin: /usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin []$ which python /apps/u/spack/gcc/13.2.0/python/3.12.0-ovfqpv2/bin/python

Loading modules updates a number of environment variables exposing the compiler/application/software we want to use.


What’s Happening to My Environment?

Loading modules does not change anything in your /home nor /project storage space - nothing is being installed.

What is being updated are the environment variables within your current session. These are not persistent across other sessions.

You can have two or more separate/independent sessions opened at the same time, each with different modules loaded - but all still sharing the same file storage system.

Remember: The order that the system will look within paths is from left to right.

The /apps/u/spack/gcc/14.2.0/python/3.12.0-4e5he6r/bin path is before the /usr/bin/ so it is picking up python/3.12.0 before the system’s version.


Can we use the R language?

Can we use R within an environment?

[]$ module purge []$ module avail

Do we see R available within the default list of available modules? No.

Can we search for it?

[]$ module spider r

From what is returned by module spider, can you see why it isn’t currently available?

Can we load it?

[]$ module load r/4.4.0

How do we fix this?


Can we use the R language? Fixed

Notice from the module spider r/4.4.0 that this module has a dependency on gcc/14.2.0.

Until gcc/14.2.0 has been loaded, we will not see r/4.4.0 being available.

[]$ module purge []$ module load gcc/14.2.0 r/4.4.0

Once r/4.4.0 has been loaded, how many additional dependencies/libraries have also been loaded?

[]$ ml Currently Loaded Modules: 1) slurm/latest (S) 42) libxau/1.0.8 ... 40) libpthread-stubs/0.4 81) r/4.4.0 41) xproto/7.0.31

A lot! The module system hides the loading of all these additional libraries.

Let’s check R is loaded:

[]$ R --version R version 4.4.0 (2024-04-24) -- "Puppy Cup"

Can we also use the Python language?

We know we can only have one compiler loaded at a time.

Can we have more than one language loaded at a time?

First load r/4.4.0, and then python/3.12.0 - what do we notice?

[]$ module purge []$ module load gcc/14.2.0 r/4.4.0 []$ module load python/3.12.0 ------------------------------------------------------------------------------- The following dependent module(s) are not currently loaded: python/3.10.6 (required by: glib/2.78.0, xcb-proto/1.15.2, gobject-introspection/1.76.1) ------------------------------------------------------------------------------- The following have been reloaded with a version change: 1) python/3.10.6 => python/3.12.0 []$ R --version R version 4.4.0 (2024-04-24) -- "Puppy Cup" []$ python --version Python 3.12.0

Notice: Loading R automatically loaded python/3.10.6. This is actually a dependency that was loaded during the loading of R.

After loading a module, consider calling ml to see what dependencies have also been loaded.

Loading python/3.12.0 replaced this version.

Question: Does replacing python/3.10.6 with python/3.12.0 affect R?

Potentially: We do not know how this is being used and thus can’t say definitively if it does/doesn’t affect the running of R.

We would suggest airing on the side of caution and check everything is running as expected if you notice modules being replaced.


Load Another Compiler:

After loading R and Python, what happens if you replace the gcc compiler?

[]$ module purge []$ module load gcc/14.2.0 r/4.4.0 []$ R --version R version 4.4.0 (2024-04-24) -- "Puppy Cup" ... []$ python --version Python 3.10.6

Load the Intel oneapi compiler and see what happens to your environment:

[]$ module load oneapi/2024.1.0 Lmod is automatically replacing "gcc/14.2.0" with "oneapi/2024.1.0". Inactive Modules: 1) cairo/1.16.0 30) libxcb/1.14 ... 29) libxau/1.0.8 58) xtrans/1.4.0 Due to MODULEPATH changes, the following have been reloaded: 1) berkeley-db/18.1.40 6) libiconv/1.17 11) pigz/2.7 16) zlib-ng/2.1.4 ... 5) libbsd/0.11.7 10) perl/5.38.0 15) xz/5.4.1 []$ R --version -bash: R: command not found []$ python --version Python 3.9.18 []$ which python /usr/bin/python

Remember: You can only have one compiler loaded at a time.

Replacing one compiler with another will affect modules that have the first compiler as a dependency.

So, with the above, R has been made inactive and is no longer available, and Python has also been made inactive and reverts back to the System version.


But R is still in my Module List?

What can you see if you list the modules loaded?

[]$ ml Currently Loaded Modules: 1) slurm/latest (S) 8) xz/5.4.1 15) berkeley-db/18.1.40 ... 6) bzip2/1.0.8 13) gettext/0.22.3 20) libbsd/0.11.7 7) libiconv/1.17 14) pkgconf/1.9.5 Where: S: Module is Sticky, requires --force to unload or purge Inactive Modules: ... 13) python/3.10.6 42) libtirpc/1.3.3 ... 29) pixman/0.42.2 58) r/4.4.0

Yes, both R and Python are listed.

But these are listed as Inactive Modules, so are no longer active to use - and thus R can not be called.

We would suggest, as a good practice, always perform a module purge before changing compilers and generate a clean environment.

[]$ module purge []$ module load oneapi/2024.1.0 []$ ml Currently Loaded Modules: 1) slurm/latest (S) 2) arcc/1.0 (S) 3) oneapi/2024.1.0

Remember:

  • Only one compiler/version can be loaded into your environment at a time.

  • Can only load languages/applications built with the same compiler. 

  • But, even this can introduce dependency issues.

Will changing the version of Python affect R? In this case probably not.

But if underlying versions of libraries are changing - then maybe - and that’s the best we can say…

Remember: The more complicated your environments, the more dependencies there’ll be, the more potential for dependency hell.


Exercises:

Try answers the following questions:

  • Why do we need a module system?

  • What modules are available that relate to netcdf?

  • What modules become available after loading nvhpc-sdk/24.3?

  • How would you identify modules that have no dependencies?


Exercises: Answers:

To enable researchers to individually configure sessions with the compilers/libraries/applications they require for their specific workflows.

Use the module spider command to search for modules. You will see something of the form:

[]$ module spider netcdf ---------------------------------------------------------------------------- netcdf-c: netcdf-c/4.9.2-ompi ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "netcdf-c/4.9.2-ompi" module is available to load. arcc/1.0 gcc/13.2.0 openmpi/4.1.6 Help: NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. This is the C distribution. ---------------------------------------------------------------------------- netcdf-cxx4: netcdf-cxx4/4.3.1-ompi ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "netcdf-cxx4/4.3.1-ompi" module is available to load. arcc/1.0 gcc/13.2.0 openmpi/4.1.6 Help: NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. This is the C++ distribution. ---------------------------------------------------------------------------- parallel-netcdf: parallel-netcdf/1.12.3-ompi ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "parallel-netcdf/1.12.3-ompi" module is available to load. arcc/1.0 gcc/13.2.0 openmpi/4.1.6 Help: PnetCDF (Parallel netCDF) is a high-performance parallel I/O library for accessing files in format compatibility with Unidata's NetCDF, specifically the formats of CDF-1, 2, and 5.

First, module load this compiler, and then use the module avail to see what additional modules are now available.

[]$ module purge []$ module load nvhpc-sdk/24.3 []$ module avail ... ----------------- /apps/u/opt/compilers/nvhpc/24.3/modulefiles ----------------- nvhpc-byo-compiler/24.3 nvhpc-hpcx/24.3 nvhpc-openmpi3/24.3 nvhpc-hpcx-cuda12/24.3 nvhpc-nompi/24.3 nvhpc/24.3 ...

Modules that are available from the containers and conda-envs and linux trees are typically installed with no dependencies.

[]$ module avail ... -------------- /apps/s/lmod/mf/opt/linux-rhel9-x86_64/containers --------------- bam-readcount/0.8.0 regtools/1.0.0 stress-ng/0.17.08 ... -------------- /apps/s/lmod/mf/opt/linux-rhel9-x86_64/conda-envs --------------- beast1/1.10.4 mafft/7.526 qiime2-amplicon/2024.5 bowtie2/2.5.4 multiqc/1.24.1 qiime2-metagenome/2024.5 julia/1.10.3 python2/2.7.18 rseqc/5.0.3 ...

Modules that are based on Linux binaries might/might not have dependencies, but these will be automatically loaded. For example:

[]$ module avail ... ----------------- /apps/s/lmod/mf/opt/linux-rhel9-x86_64/linux ----------------- ... fastqc/0.12.1 matlab/2024a sratoolkit/3.1.1 gaussian/16.AVX2.b01 miniconda3/24.3.0 subread/2.0.6 gsutil/491.0.0 muscle/5.2 tophat/2.1.1 guppy-cpu/6.5.7 nextflow/23.10.1 trimmomatic/0.39 ...

matlab

[powersw@mblog1 linux]$ module spider matlab/2024a ---------------------------------------------------------------------------- matlab: matlab/2024a ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "matlab/2024a" module is available to load. arcc/1.0

versus trimmomatic:

[]$ module spider trimmomatic/0.39 ---------------------------------------------------------------------------- trimmomatic: trimmomatic/0.39 ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "trimmomatic/0.39" module is available to load. arcc/1.0 Help: trimmomativ : 0.39 A flexible read trimming tool for Illumina NGS data. http://www.usadellab.org/cms/?page=trimmomatic Loads dependencies: gcc/14.2.0 and openjdk/11.0.20.1_1

Notice that gcc/14.2.0 and openjdk/11.0.20.1_1 will be automatically loaded:

[]$ module purge []$ module load trimmomatic/0.39 []$ gcc --version gcc (Spack GCC) 14.2.0 ... []$ java -version openjdk version "11.0.20.1" 2023-08-24 ...

Consider: How does this effect you environment and any existing loaded modules? Look at any warning messages.


 

Related content