Using Anaconda Package Manager

Anaconda, from Anaconda, Inc is a completely free enterprise-ready distribution for large-scale data processing, predictive analytics, and scientific computing. It includes over 195 of the most popular Python packages for science, math, engineering, and data analysis. It also offers the ability to easily create custom environments by mixing and matching different versions of Python and/or R and other packages into isolated environments that individual users are free to create.  Anaconda includes the conda package and environment manager to make managing these environments straightforward.

Using Anaconda

While the standard methods of installing packages via pip and easy_install work with Anaconda, the preferred method is using the conda command.  

Full documentation on using Conda is available at http://conda.pydata.org/docs/

A cheatsheet is also provided.

A few examples of the basic commands are provided here.  For a full explanation of all of Anaconda/Conda’s capabilities, see the documentation linked above. 

Anaconda is provided through the anaconda module on HCC machines.  To begin using it, load the Anaconda module.

Load the Anaconda module to start using Conda
module load anaconda

To display general information about Conda/Anaconda, use the info subcommand.

Display general information about Conda/Anaconda
conda info

Conda allows the easy creation of isolated, custom environments with packages and versions of your choosing.  To show all currently available environments, and which is active, use the info subcommand with the -e option.

List available environments
conda info -e

The active environment will be marked with an asterisk (*) character.

The list command will show all packages installed in the currently active environment.

List installed packages in current environment
conda list

To find the names of packages, use the search subcommand.

Search for packages
conda search numpy

If the package is available, this will also display available package versions and compatible Python versions the package may be installed under.

Creating Custom Anaconda Environment

The create command is used to create a new environment.  It requires at a minimum a name for the environment, and at least one package to install.  For example, suppose we wish to create a new environment, and need version 1.8 of NumPy.

Create a new environment by providing a name and package specification
conda create -n mynumpy numpy=1.8 

This will create a new environment called ‘mynumpy’ and installed NumPy version 1.8, along with any required dependencies.  

To use the environment, we must first activate it.

Activate environment
source activate mynumpy

Our new environment is now active, and we can use it.  The shell prompt will change to indicate this as well (this can be disable if desired).

Creating Custom GPU Anaconda Environment

We provide GPU versions of various frameworks such as tensorflow, keras, theano, via modules. However, sometimes you may need additional libraries or packages that are not available as part of these modules. In this case, you will need to create your own GPU Anaconda environment.

To do this, you need to first clone one of our GPU modules to a new Anaconda environment, and then install the desired packages in this new environment.

The reason for this is that the GPU modules we support are built using the specific CUDA drivers our GPU nodes have. If you just create custom GPU environment without cloning the module, your code will not utilize the GPUs.

For example, if you want to use tensorflow with additional packages, first do:

Cloning GPU module to a new Anaconda environment
module load tensorflow-gpu/py36/1.12 anaconda
conda create -n tensorflow-gpu-1.12-custom --clone $CONDA_DEFAULT_ENV
module purge

This will create a new tensorflow-gpu-1.12-custom environment in your home directory that is a copy of the tensorflow-gpu module. Then, you can install the additional packages you need in this environment.

Install new packages in the currently active environment
module load anaconda
source activate tensorflow-gpu-1.12-custom
conda install <packages>

Next, whenever you want to use this custom GPU Anaconda environment, you need to add these two lines in your submit script:

module load anaconda
source activate tensorflow-gpu-1.12-custom

If you have custom GPU Anaconda environment please only use the two lines from above and DO NOT load the module you have cloned earlier. Using module load tensorflow-gpu/py36/1.12 and source activate tensorflow-gpu-1.12-custom in the same script is wrong and may give you various errors and incorrect results.

Adding Packages to an Existing Environment

To install additional packages in an environment, use the install subcommand.  Suppose we want to install iPython in our ‘mynumpy’ environment.  While the environment is active, use install with no additional arguments.  

Install a new package in the currently active environment
conda install ipython

If you aren’t currently in the environment you wish to install the package in, add the -n option to specify the name.

Install new packages in a specified environment
conda install -n mynumpy ipython

The remove subcommand to uninstall a package functions similarly.

Remove package from currently active environment
conda remove ipython
Remove package from environment specified by name
conda remove -n mynumpy ipython

To exit an environment, we deactivate it.

Exit current environment
source deactivate

Finally, to completely remove an environment, add the --all option to remove.

Completely remove an environment
conda remove -n mynumpy --all

Using an Anaconda Environment in a Jupyter Notebook on Crane

It is not difficult to make an Anaconda environment available to a Jupyter Notebook. To do so, follow the steps below, replacing myenv with the name of the Python or R environment you wish to use:

  1. Stop any running Jupyter Notebooks and ensure you are logged out of the JupyterHub instance at https://crane.unl.edu

    1. If you are not logged out, please click the Control Panel button located in the top right corner.
    2. Click the “Stop My Server” Button to terminate the Jupyter server.
    3. Click the logout button in the top right corner.

  2. Using the command-line environment, load the target conda environment:

    source activate myenv

  3. Install the Jupyter kernel and add the environment:

    1. For a Python conda environment, install the IPykernel package, and then the kernel specification:

              # Install ipykernel
              conda install ipykernel
      
              # Install the kernel specification
              python -m ipykernel install --user --name "$CONDA_DEFAULT_ENV" --display-name "Python ($CONDA_DEFAULT_ENV)"
              
    2. For an R conda environment, install the jupyter_client and IRkernel packages, and then the kernel specification:

              # Install PNG support for R, the R kernel for Jupyter, and the Jupyter client
              conda install r-png
              conda install r-irkernel jupyter_client
      
              # Install jupyter_client 5.2.3 from anaconda channel for bug workaround
              conda install -c anaconda jupyter_client
      
              # Install the kernel specification
              R -e "IRkernel::installspec(name = '$CONDA_DEFAULT_ENV', displayname = 'R ($CONDA_DEFAULT_ENV)', user = TRUE)"
              
  4. Once you have the environment set up, deactivate it:

    source deactivate

  5. To make your conda environments accessible from the worker nodes, enter the following commands:

        mkdir -p $WORK/.jupyter
        mv ~/.local/share/jupyter/kernels $WORK/.jupyter
        ln -s $WORK/.jupyter/kernels ~/.local/share/jupyter/kernels
        

    Note: Step 5 only needs to be done once. Any future created environments will automatically be accessible from SLURM notebooks once this is done.

  6. Login to JupyterHub at https://crane.unl.edu and create a new notebook using the environment by selecting the correct entry in the New dropdown menu in the top right corner.