Anaconda,
from Anaconda, Inc
is a completely free enterprise-ready distribution for large-scale data
processing, predictive analytics, and scientific computing. It includes
over 195 of the most popular Python packages for science, math,
engineering, and data analysis. It also offers the ability to easily
create custom environments by mixing and matching different versions
of Python and/or R and other packages into isolated environments that
individual users are free to create. Anaconda includes the conda
package and environment manager to make managing these environments
straightforward.
While the standard methods of installing packages via pip
and easy_install
work with Anaconda, the preferred method is using
the conda
command.
Full documentation on using Conda is available at http://conda.pydata.org/docs/
A cheatsheet is also provided.
A few examples of the basic commands are provided here. For a full explanation of all of Anaconda/Conda’s capabilities, see the documentation linked above.
Anaconda is provided through the anaconda
module on HCC machines. To
begin using it, load the Anaconda module.
module load anaconda
To display general information about Conda/Anaconda, use the info
subcommand.
conda info
Conda allows the easy creation of isolated, custom environments with
packages and versions of your choosing. To show all currently available
environments, and which is active, use the info
subcommand with the
-e
option.
conda info -e
The active environment will be marked with an asterisk (*) character.
The list
command will show all packages installed
in the currently active environment.
conda list
To find packages, use the search
subcommand.
conda search numpy
If the package is available, this will also display available package versions and compatible Python versions the package may be installed under.
The create
command is used to create a new environment. It requires
at a minimum a name for the environment, and at least one package to
install. For example, suppose we wish to create a new environment, and
need version 1.17 of NumPy.
conda create -n mynumpy numpy=1.17
This will create a new environment called ‘mynumpy’ and installed NumPy version 1.17, along with any required dependencies.
To use the environment, we must first activate it.
conda activate mynumpy
Our new environment is now active, and we can use it. The shell prompt will change to indicate this as well.
By default, conda environments are installed in the user’s home
directory at ~/.conda/envs
.
This is fine for smaller environments, but larger environments (especially ML/AI-based ones) can quickly
exhaust the space in the home
directory.
For larger environments, we recommend using the $COMMON
folder instead. To do so, use the -p
option
instead of -n
for conda create
. For example, creating the same environment as above but
placing it in the folder $COMMON/mynumpy
instead.
conda create -p $COMMON/mynumpy numpy=1.17
To activate the environment, you must use the full path.
conda activate $COMMON/mynumpy
Please note that you’ll need to add the #SBATCH --licenses=common
directive to your submit scripts
as described here in order to use environments
in $COMMON
.
To install additional packages in an environment, use the install
subcommand. Suppose we want to install iPython in our ‘mynumpy’
environment. While the environment is active, use install
with no
additional arguments.
conda install ipython
If you aren’t currently in the environment you wish to install the
package in, add the -n
option to specify the name.
conda install -n mynumpy ipython
The remove
subcommand to uninstall a package functions similarly.
conda remove ipython
conda remove -n mynumpy ipython
To exit an environment, we deactivate it.
conda deactivate
Finally, to completely remove an environment, add the --all
option
to remove
.
conda remove -n mynumpy --all
Sometimes conda environments need to be moved (e.g., from $HOME
to $COMMON
in order to reduce used space in $HOME
)
or recreated (e.g., when shared with someone). This is done using environment.yml
file as shown below.
conda activate mynumpy
Then export the active conda environment to file environment.yml
.
conda env export > environment.yml
Next, deactivate the conda environment.
conda deactivate
The file environment.yml
contains both pip
and conda
packages installed in the activated environment.
This file can now be shared or used to recreate the conda environment elsewhere.
The exported environment can be recreated in $COMMON
with:
conda env create -p $COMMON/mynumpy -f environment.yml
After the conda environment has been exported or recreated, if needed, the original conda environment can be removed.
conda remove -n mynumpy --all
The migrated environment can then be activated with:
conda activate $COMMON/mynumpy
Please note that you’ll need to add the #SBATCH --licenses=common
directive to your submit scripts
as described here in order to use environments
in $COMMON
.
By default, conda environments are installed in the user’s home directory at ~/.conda/envs
.
conda caches and package tarballs are stored in ~/.conda/
as well.
For larger or many conda environments, the size of the directory ~/.conda/
can easily reach the $HOME
space quota limit of
20 GiB
per user.
In addition to Moving and Recreating Existing Environment, one can remove unused conda packages and caches.
conda clean --all
Please note that this command will only remove index cache, and unused cache packages and tarballs and will not affect nor break the current conda environments you have.
We provide GPU versions of various frameworks such as tensorflow
, keras
, theano
, via modules.
However, sometimes you may need additional libraries or packages that are not available as part of these modules.
In this case, you will need to create your own GPU Anaconda environment.
To do this, you need to first clone one of our GPU modules to a new Anaconda environment, and then install the desired packages in this new environment.
The reason for this is that the GPU modules we support are built using the specific CUDA drivers our GPU nodes have. If you just create custom GPU environment without cloning the module, your code will not utilize the GPUs correctly.
For example, if you want to use tensorflow
with additional packages, first do:
module load tensorflow-gpu/py311/2.15
module load anaconda
conda create -n tensorflow-gpu-2.15-custom --clone $CONDA_DEFAULT_ENV
module purge
While tensorflow-gpu/py311/2.15
is used here as an example module and version, please make sure you use the newest available version of the module you want to clone, or the version that is needed for your particular research needs.
This will create a new tensorflow-gpu-2.15-custom
environment in your home directory that is a copy of the tensorflow-gpu
module.
Then, you can install the additional packages you need in this environment.
module load anaconda
conda activate tensorflow-gpu-2.15-custom
conda install --no-update-deps <packages>
When installing packages in existing/cloned environment, please use --no-update-deps
.
This will ensure that already installed dependencies are not being updated or changed.
Next, whenever you want to use this custom GPU Anaconda environment, you need to add these two lines in your submit script:
module load anaconda
conda activate tensorflow-gpu-2.15-custom
If you have custom GPU Anaconda environment please only use the two lines from above and DO NOT load the module you have cloned earlier.
Using module load tensorflow-gpu/py311/2.15
and conda activate tensorflow-gpu-2.15-custom
in the same script is wrong and may give you various errors and incorrect results.
Some conda packages available on conda-forge
and bioconda
support MPI (via openmpi
or mpich
). However, just using the openmpi
and mpich
packages from conda-forge
often does not work on HPC systems. More information about this can be found here.
In order to be able to correctly use these MPI packages with the MPI libraries installed on our clusters, two steps need to be performed.
First, at install time, besides the package, the “dummy” package openmpi=4.1.*=external_*
or mpich=4.0.*=external_*
needs to be installed for openmpi
or mpich
respectively.
These “dummy” packages are empty, but allow the solver to create correct environments and use the system-wide modules when the environment is activated.
Secondly, when activating the conda environment and using the package, the system-wide openmpi/4.1
or mpich/4.0
module needs to be loaded depending on the MPI library used.
Currently only packages that were built using openmpi 4.1
and mpich 4.0
are supported on HCC clusters.
For example, the steps for creating conda environment with mpi4py
that supports openmpi
are:
module purge
module load anaconda
conda create -n mpi4py-openmpi mpi4py openmpi=4.1.*=external_*
module purge
module load compiler/gcc/10 openmpi/4.1 anaconda
conda activate mpi4py-openmpi
The steps for creating conda environment with mpi4py
that supports mpich
are:
module purge
module load anaconda
conda create -n mpi4py-mpich mpi4py mpich=4.0.*=external_*
module purge
module load compiler/gcc/10 mpich/4.0 anaconda
conda activate mpi4py-mpich
It is not difficult to make an Anaconda environment available to a
Jupyter Notebook. To do so, follow the steps below, replacing
myenv
with the name of the Python or R environment you wish to use:
Stop any running Jupyter Notebooks and ensure you are logged out of the JupyterHub instance on the cluster you are using.
Using the command-line environment of the login node, load the target conda environment:
conda activate myenv
Install the Jupyter kernel and add the environment:
For a Python conda environment, install the IPykernel package, and then the kernel specification:
# Install ipykernel
conda install ipykernel
# Install the kernel specification
python -m ipykernel install --user --name "$CONDA_DEFAULT_ENV" --display-name "Python ($CONDA_DEFAULT_ENV)" --env PATH $PATH
If the conda environment is located in $COMMON (e.g., $COMMON/conda_env
), please use the name of the environment
instead of $CONDA_DEFAULT_ENV, e.g.,:
python -m ipykernel install --user --name conda_env --display-name "Python (conda_env)" --env PATH $PATH
where conda_env
is replaced with the name of your conda enironment.
If needed, other variables can be set via additional --env
arguments, e.g.,
python -m ipykernel install --user --name "$CONDA_DEFAULT_ENV" --display-name "Python ($CONDA_DEFAULT_ENV)" --env PATH $PATH --env VAR value
,
where VAR
and value
are the name and the value of the variable respectively.
For an R conda environment, install the jupyter_client and IRkernel packages, and then the kernel specification:
# Install PNG support for R, the R kernel for Jupyter, and the Jupyter client
conda install r-png
conda install r-irkernel jupyter_client
# Install jupyter_client 5.2.3 from anaconda channel for bug workaround
conda install -c anaconda jupyter_client
# Install the kernel specification
R -e "IRkernel::installspec(name = '$CONDA_DEFAULT_ENV', displayname = 'R ($CONDA_DEFAULT_ENV)', user = TRUE)"
Once you have the environment set up, deactivate it:
conda deactivate
Login to JupyterHub
and create a new notebook using the environment by selecting the
correct entry in the New
dropdown menu in the top right
corner.
Mamba is an alternative to Conda that is in general faster and performs better at resolving dependencies in conda environments.
Mamba is available as part of the anaconda
modules on Swan.
Mamba can be used by simply replacing conda
with mamba
in all conda
commands provided here.
module load anaconda
To create a new environment called ‘mynumpy’ and install NumPy version 1.17, along with any required dependencies, the command is:
mamba create -n mynumpy numpy=1.17