/
What is Conda? Using Miniconda3 on the Cluster

What is Conda? Using Miniconda3 on the Cluster

Goal: Introduce conda, define some terminology, how to use on the cluster and finding help.


What is Conda?

  • Getting Started with Conda: A powerful command line tool for package and environment management that runs on Windows, macOS, and Linux.

  • Conda Documentation: Provides package, dependency, and environment management for any language.


Getting Conda: Miniconda vs Anaconda vs Miniforge

Miniconda: (<0.5G) A free minimal installer for conda. It is a small bootstrap version of Anaconda that includes only conda, Python, the packages (<70) they both depend on, and a small number of other useful packages (like pip, zlib, and a few others).

  • This is what ARCC provides.

Anaconda: (4.4G) (The Company) Anaconda® Distribution is a free Python/R data science distribution that contains:

  • Conda and Anaconda Navigator (a desktop GUI application built on conda, with options to launch other development applications from your managed environments).

  • 250 automatically-installed packages and access to the Anaconda Public Repository (>8K open-source data science and ML packages).

Miniforge: This community driven repository holds the minimal installers for Conda and Mamba (a reimplementation of the conda package manager in C++) specific to conda-forge (Community-led recipes, infrastructure and distributions for conda - the default and only channel.

Should I use Anaconda Distribution or Miniconda?


Terminology

Glossary:

Term

Definition

Term

Definition

package manager

A collection of software tools that automates the process of installing, updating, configuring, and removing computer programs for a computer's operating system.

conda

Conda is a package manager. The package and environment manager program … that installs and updates conda packages and their dependencies.

conda package

A compressed file that contains everything that a software program needs in order to be installed and run, so that you do not have to manually find and install each dependency separately.

conda environment

A folder or directory that contains a specific collection of conda packages and their dependencies, so they can be maintained and run separately without interference from each other.

conda repository

A cloud-based repository that contains packages that are easily installed.

channels

The locations of the repositories where conda looks for packages.


Dependency Hell

The concept of dependency hell was introduced in the Module System workshop and rears its ugly head when trying to set up Python and/or R environments with a lot of packages.

Conda environments can provide a method to create self contained, independent, environments, with a focus on a specific analysis environment, removing dependency clashes across single behemoth environments.


Using Miniconda3/Conda on the Cluster

You do NOT need to install miniconda3 yourself. It is provided as pre-installed module.

[]$ module spider miniconda3 ---------------------------------------------------------------------------- miniconda3: miniconda3/24.3.0 ---------------------------------------------------------------------------- You will need to load all module(s) on any one of the lines below before the "miniconda3/24.3.0" module is available to load. arcc/1.0 Help: The minimalist bootstrap toolset for conda and Python3.

We update miniconda3 on a semi-frequent basis.

You can install miniconda and anaconda yourself - but this will modify your cluster environment.

Make sure you understand what you’re doing.


Conda Version and Help

[]$ module purge []$ module load miniconda3/24.3.0 []$ conda --version conda 24.3.0 []$ conda --help usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ... conda is a tool for managing and deploying applications, environments and packages. ... [salexan5@mblog1 ~]$ conda install --help usage: conda install [-h] [--revision REVISION] [-n ENVIRONMENT | -p PATH] [-c CHANNEL] [--use-local] [--override-channels] [--repodata-fn REPODATA_FNS] [--experimental {jlap,lock}] [--no-lock] [--repodata-use-zst | --no-repodata-use-zst] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps] [--no-pin] [--copy] [--no-shortcuts] [--shortcuts-only SHORTCUTS_ONLY] [-C] [-k] [--offline] [--json] [-v] [-q] [-d] [-y] [--download-only] [--show-channel-urls] [--file FILE] [--solver {classic,libmamba}] [--force-reinstall] [--freeze-installed | --update-deps | -S | --update-all | --update-specs] [-m] [--clobber] [--dev] [package_spec ...] ...
[]$ conda --help usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ... conda is a tool for managing and deploying applications, environments and packages. options: -h, --help Show this help message and exit. -v, --verbose Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG logging, four times for TRACE logging. --no-plugins Disable all plugins that are not built into conda. -V, --version Show the conda version number and exit. commands: The following built-in and plugins subcommands are available. COMMAND activate Activate a conda environment. clean Remove unused packages and caches. compare Compare packages between conda environments. config Modify configuration values in .condarc. content-trust Signing and verification tools for Conda create Create a new conda environment from a list of specified packages. deactivate Deactivate the current active conda environment. doctor Display a health report for your environment. export Export a given environment info Display information about current conda install. init Initialize conda for shell interaction. install Install a list of packages into a specified conda environment. list List installed packages in a conda environment. notices Retrieve latest channel notifications. package Create low-level conda packages. (EXPERIMENTAL) remove (uninstall) Remove a list of packages from a specified conda environment. rename Rename an existing environment. repoquery Advanced search for repodata. run Run an executable in a conda environment. search Search for packages and display associated information using the MatchSpec format. update (upgrade) Update conda packages to the latest compatible version.
[]$ conda install --help usage: conda install [-h] [--revision REVISION] [-n ENVIRONMENT | -p PATH] [-c CHANNEL] [--use-local] [--override-channels] [--repodata-fn REPODATA_FNS] [--experimental {jlap,lock}] [--no-lock] [--repodata-use-zst | --no-repodata-use-zst] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps] [--no-pin] [--copy] [--no-shortcuts] [--shortcuts-only SHORTCUTS_ONLY] [-C] [-k] [--offline] [--json] [-v] [-q] [-d] [-y] [--download-only] [--show-channel-urls] [--file FILE] [--solver {classic,libmamba}] [--force-reinstall] [--freeze-installed | --update-deps | -S | --update-all | --update-specs] [-m] [--clobber] [--dev] [package_spec ...] Install a list of packages into a specified conda environment. This command accepts a list of package specifications (e.g, bitarray=0.8) and installs a set of packages consistent with those specifications and compatible with the underlying environment. If full compatibility cannot be assured, an error is reported and the environment is not changed. Conda attempts to install the newest versions of the requested packages. To accomplish this, it may update some packages that are already installed, or install additional packages. To prevent existing packages from updating, use the --freeze-installed option. This may force conda to install older versions of the requested packages, and it does not prevent additional dependency packages from being installed. If you wish to skip dependency checking altogether, use the '--no-deps' option. This may result in an environment with incompatible packages, so this option must be used with great caution. conda can also be called with a list of explicit conda package filenames (e.g. ./lxml-3.2.0-py27_0.tar.bz2). Using conda in this mode implies the --no-deps option, and should likewise be used with great caution. Explicit filenames and package specifications cannot be mixed in a single command. positional arguments: package_spec List of packages to install or update in the conda environment. options: -h, --help Show this help message and exit. --revision REVISION Revert to the specified REVISION. --file FILE Read package versions from the given file. Repeated file specifications can be passed (e.g. --file=file1 --file=file2). --dev Use `sys.executable -m conda` in wrapper scripts instead of CONDA_EXE. This is mainly for use during tests where we test new conda sources against old Python versions. Target Environment Specification: -n ENVIRONMENT, --name ENVIRONMENT Name of environment. -p PATH, --prefix PATH Full path to environment location (i.e. prefix). Channel Customization: -c CHANNEL, --channel CHANNEL Additional channel to search for packages. These are URLs searched in the order they are given (including local directories using the 'file://' syntax or simply a path like '/home/conda/mychan' or '../mychan'). Then, the defaults or channels from .condarc are searched (unless --override-channels is given). You can use 'defaults' to get the default packages for conda. You can also use any name and the .condarc channel_alias value will be prepended. The default channel_alias is https://conda.anaconda.org/. --use-local Use locally built packages. Identical to '-c local'. --override-channels Do not search default or .condarc channels. Requires --channel. --repodata-fn REPODATA_FNS Specify file name of repodata on the remote server where your channels are configured or within local backups. Conda will try whatever you specify, but will ultimately fall back to repodata.json if your specs are not satisfiable with what you specify here. This is used to employ repodata that is smaller and reduced in time scope. You may pass this flag more than once. Leftmost entries are tried first, and the fallback to repodata.json is added for you automatically. For more information, see conda config --describe repodata_fns. --experimental {jlap,lock} jlap: Download incremental package index data from repodata.jlap; implies 'lock'. lock: use locking when reading, updating index (repodata.json) cache. Now enabled. --no-lock Disable locking when reading, updating index (repodata.json) cache. --repodata-use-zst, --no-repodata-use-zst Check for/do not check for repodata.json.zst. Enabled by default. Solver Mode Modifiers: --strict-channel-priority Packages in lower priority channels are not considered if a package with the same name appears in a higher priority channel. --no-channel-priority Package version takes precedence over channel priority. Overrides the value given by `conda config --show channel_priority`. --no-deps Do not install, update, remove, or change dependencies. This WILL lead to broken environments and inconsistent behavior. Use at your own risk. --only-deps Only install dependencies. --no-pin Ignore pinned file. --solver {classic,libmamba} Choose which solver backend to use. --force-reinstall Ensure that any user-requested package for the current operation is uninstalled and reinstalled, even if that package already exists in the environment. --freeze-installed, --no-update-deps Do not update or change already-installed dependencies. --update-deps Update dependencies that have available updates. -S, --satisfied-skip-solve Exit early and do not run the solver if the requested specs are satisfied. Also skips aggressive updates as configured by the 'aggressive_update_packages' config setting. Use 'conda info --describe aggressive_update_packages' to view your setting. --satisfied-skip- solve is similar to the default behavior of 'pip install'. --update-all, --all Update all installed packages in the environment. --update-specs Update based on provided specifications. Package Linking and Install-time Options: --copy Install all packages using copies instead of hard- or soft-linking. --no-shortcuts Don't install start menu shortcuts --shortcuts-only SHORTCUTS_ONLY Install shortcuts only for this package name. Can be used several times. -m, --mkdir Create the environment directory, if necessary. --clobber Allow clobbering (i.e. overwriting) of overlapping file paths within packages and suppress related warnings. Networking Options: -C, --use-index-cache Use cache of channel index files, even if it has expired. This is useful if you don't want conda to check whether a new version of the repodata file exists, which will save bandwidth. -k, --insecure Allow conda to perform "insecure" SSL connections and transfers. Equivalent to setting 'ssl_verify' to 'false'. --offline Offline mode. Don't connect to the Internet. Output, Prompt, and Flow Control Options: --json Report all output as json. Suitable for using conda programmatically. -v, --verbose Can be used multiple times. Once for detailed output, twice for INFO logging, thrice for DEBUG logging, four times for TRACE logging. -q, --quiet Do not display progress bar. -d, --dry-run Only display what would have been done. -y, --yes Sets any confirmation values to 'yes' automatically. Users will not be asked to confirm any adding, deleting, backups, etc. --download-only Solve an environment and ensure package caches are populated, but exit prior to unlinking and linking packages into the prefix. --show-channel-urls Show channel urls. Overrides the value given by `conda config --show show_channel_urls`. Examples: Install the package 'scipy' into the currently-active environment:: conda install scipy Install a list of packages into an environment, myenv:: conda install -n myenv scipy curl wheel Install a specific version of 'python' into an environment, myenv:: conda install -p path/to/myenv python=3.11

  

Related content