What is Conda? Using Miniconda3 on the Cluster

What is Conda? Using Miniconda3 on the Cluster

Goal: Introduce conda, define some terminology, how to use on the cluster and finding help.


What is Conda?

  • Getting Started with Conda: A powerful command line tool for package and environment management that runs on Windows, macOS, and Linux.

  • Conda Documentation: Provides package, dependency, and environment management for any language.


Getting Conda: Miniconda vs Anaconda vs Miniforge

Miniconda: (<0.5G) A free minimal installer for conda. It is a small bootstrap version of Anaconda that includes only conda, Python, the packages (<70) they both depend on, and a small number of other useful packages (like pip, zlib, and a few others).

  • This is what ARCC provides.

Anaconda: (4.4G) (The Company) Anaconda® Distribution is a free Python/R data science distribution that contains:

  • Conda and Anaconda Navigator (a desktop GUI application built on conda, with options to launch other development applications from your managed environments).

  • 250 automatically-installed packages and access to the Anaconda Public Repository (>8K open-source data science and ML packages).

Miniforge: This community driven repository holds the minimal installers for Conda and Mamba (a reimplementation of the conda package manager in C++) specific to conda-forge (Community-led recipes, infrastructure and distributions for conda - the default and only channel.

Should I use Anaconda Distribution or Miniconda?


Terminology

Glossary:

Term

Definition

Term

Definition

package manager

A collection of software tools that automates the process of installing, updating, configuring, and removing computer programs for a computer's operating system.

conda

Conda is a package manager. The package and environment manager program … that installs and updates conda packages and their dependencies.

conda package

A compressed file that contains everything that a software program needs in order to be installed and run, so that you do not have to manually find and install each dependency separately.

conda environment

A folder or directory that contains a specific collection of conda packages and their dependencies, so they can be maintained and run separately without interference from each other.

conda repository

A cloud-based repository that contains packages that are easily installed.

channels

The locations of the repositories where conda looks for packages.


Dependency Hell

The concept of dependency hell was introduced in the Module System workshop and rears its ugly head when trying to set up Python and/or R environments with a lot of packages.

Conda environments can provide a method to create self contained, independent, environments, with a focus on a specific analysis environment, removing dependency clashes across single behemoth environments.


Using Miniconda3/Conda on the Cluster

You do NOT need to install miniconda3 yourself. It is provided as pre-installed module.

[]$ module spider miniconda3 ---------------------------------------------------------------------------- miniconda3: miniconda3/24.3.0 ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "miniconda3/24.3.0" module is available to load. arcc/1.0 Help: The minimalist bootstrap toolset for conda and Python3.

We update miniconda3 on a semi-frequent basis.

You can install miniconda and anaconda yourself - but this will modify your cluster environment.

Make sure you understand what you’re doing.


Conda Version and Help

[]$ module purge []$ module load miniconda3/24.3.0 []$ conda --version conda 24.3.0 []$ conda --help usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ... conda is a tool for managing and deploying applications, environments and packages. ... [salexan5@mblog1 ~]$ conda install --help usage: conda install [-h] [--revision REVISION] [-n ENVIRONMENT | -p PATH] [-c CHANNEL] [--use-local] [--override-channels] [--repodata-fn REPODATA_FNS] [--experimental {jlap,lock}] [--no-lock] [--repodata-use-zst | --no-repodata-use-zst] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps] [--no-pin] [--copy] [--no-shortcuts] [--shortcuts-only SHORTCUTS_ONLY] [-C] [-k] [--offline] [--json] [-v] [-q] [-d] [-y] [--download-only] [--show-channel-urls] [--file FILE] [--solver {classic,libmamba}] [--force-reinstall] [--freeze-installed | --update-deps | -S | --update-all | --update-specs] [-m] [--clobber] [--dev] [package_spec ...] ...