Notes on Anaconda
Update: I’ve learned more since writing these notes. Now, I cannot recommend scientists use conda, at least to manage Python, R, and R package installations. The reason is quite simple: channels create a dependency nightmare, while the whole appeal of conda is to avoid such nightmares. I ran into this issue trying to install
pylibseq from bioconda, and Python 3.6, R 3.4.1,
r-dbplyr from wherever. Some packages may be installed from the
default channel while others may be installed from
bioconda. But here’s the problem: there’s nothing to guarantee these packages are compatible. You can end up with loads of dynamic linking errors, and conda’s only recommendation is to be careful with how you order channels.
Furthermore, the whole reason to use conda in my opinion is for the environments. The theoretic best part for scientists is the
environment.yml file, which allows you to recreate a specific project environment from a file that can live in your git repository. However, if you need package X from
conda-forge, and package Y from
bioconda to have things work properly, you’re in trouble. As far as I can tell, there’s no way to specify which channel a package should be installed from in the YAML file (pip is an exception here). This is horrendous, as channels can disagree on what they include in their packages, even if they have the same version. For example: the
bzip2-1.0.6 is packaged with the corresponding
.so, while the
defaults channel’s version of
bzip2-1.0.6 is packaged without the
.so (source)! This is preposterous in terms of usability and reproducibility – especially given you can’t specify the exact channel to use in the YAML file. Overall, I can’t recommend folks use conda as I think a few key design choices render it unusable for managing scientific project environments (at least if you’re using Python and R from conda). I’ll keep my notes intact (below), but be warned of these serious limitations.
I’ve been wary of Anaconda but recently I’ve employed it to manage an environment shared between my OS X development machine and bonjovi, our Coop Lab server. Overall, I think it’s good technology as it greatly helps in minimizing time spent on installing software — which gets annoying as a scientist. Below are some of my notes on how to build up a project’s environment up from scratch.
Here are some general things to keep in mind – these are things I learned the hard way.
Don’t install Anaconda on OS X for all users – permissions will be wacky and cause issues.
conda env export -n your-envto create
environment.ymlfiles for creating environments for a project across different operating systems. Conda may install OS-specific dependencies, which hinders portability. Instead handcraft a minimal environment YAML file with only top-level project dependencies (more on this below). Then, only use
conda env export -n your-env > project_depends.ymlfor saving the versions of all dependencies for reproducibility reasons, not to reconstruct your environment on a different machine (thanks Joshua Shapiro and Jaime Ashander for this advice).
Channels are like Homebrew’s kegs. You can add a new channel with:
$ conda config --add channels r $ conda config --add channels bioconda $ conda config --add channels conda-forge
These are the essential channels as far as I know.
Building an environment interactively
Each project needs its own environment (or a general environment, e.g. one for all projects requiring scipy, iPython, numpy, and R). Below I quick cover how I build up an environment.
Create a new environment
First, let’s create an example new environment, with only Python 3 and R:
$ conda create -n rpy-base python=3.6.2 r=3.4.1
Now, we see this new environment:
$ conda env list # conda environments: # default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11 rpy-base /Users/vinceb/anaconda/envs/rpy-base root * /Users/vinceb/anaconda
The asterisk indicates we’re currently using the root environment. This means all programs executed will use your default
$PATH that looks in the usual places (.e.g
Now, we switch to our new environment:
$ source activate rpy-base (rpy-base)
$ echo $PATH /Users/vinceb/anaconda/envs/rpy-base/bin:/usr/local/bin [...] (rpy-base)
Our anaconda environment is now first in our search
To add new packages to a specific environment (e.g. not the root environment), we use:
$ conda install install -n rpy-base r-tidyverse
This will ask you to proceed. After installation is complete, we see this R package (and its dependencies) are now in the environment:
$ conda list # packages in environment at /Users/vinceb/anaconda/envs/rpy-base: # [...] r-tidyverse 1.1.1 r3.4.1_0 r
Saving the Conda environment
Now, we can export the environment to a YAML file which can be used to mirror the environment elsewhere. However, this is not recommended for maintaining project environments. The reason is that
conda env export returns all packages and their dependencies installed, some of which may be OS X-specific and not portable to Linux servers. A better approach is to maintain a minimal list of packages used by your project, and let Conda find the appropriate dependencies on whatever machine it’s being run on. Then, the following can be used to make a manifest of the versions per system (for reproducibility, not for mirroring an environment):
$ conda env export -n rpy-base > project_depends.yml
Ideally, then hand edit
project_depends.yml to include only the minimal dependencies. We’ll call this
rpy-base.yml — here is a minimal version example:
name: rpy-base channels: - conda-forge - bioconda - r - defaults dependencies: - python=3.6.2=0 - r=3.4.1=r3.4.1_0 - r-tidyverse=1.1.1=r3.4.1_0
Loading a Conda environment
This clones a new repo:
$ conda env create -n rpy-base2 -f rpy-base.yml
which we see with:
$ conda env list # conda environments: # default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11 rpy-base * /Users/vinceb/anaconda/envs/rpy-base rpy-base2 /Users/vinceb/anaconda/envs/rpy-base2 root /Users/vinceb/anaconda
which we can now use
source activate rpy-base2 to use.
Removing an environment
Here’s how to delete an environment, such as the cloned environment
rpy-base2 we just created.
$ conda env remove rpy-base2
and now it’s gone:
$ conda env list # conda environments: # default-fwdpy11 /Users/vinceb/anaconda/envs/default-fwdpy11 rpy-base * /Users/vinceb/anaconda/envs/rpy-base root /Users/vinceb/anaconda
Python notebook hacks
Here’s an simple example script I use to start a Jupyter notebook kernel on a server that can be accessed locally through SSH forwarding.
#!/bin/bash SERVER=bonjovi # yes, bonjovi is the name of my server if [[ "$1" == "client" ]]; then echo "setting up ssh forwarding..." ssh -N -f -L localhost:8888:localhost:8890 $SERVER || (echo "error: ssh forwarding failed." && exit 1) exit fi if [[ $# -lt 2 ]]; then echo "usage: bash launch_notebook.sh notebook.ipynb [server]" exit 1 fi source activate default-fwdpy11 if [[ "$2" == "server" ]]; then jupyter notebook "$1" --no-browser --port=8890 else jupyter notebook "$1" fi