Using Bioconda

Many of our users are performing computations in the Bioinformatics field. Python has a huge number of packages for this in the Bioconda channel. A “channel” is a repository for a collection of packages. So first you will need to install this channel.

First see: https://bioconda.github.io You should read this as it covers what Bioconda is and how to cite Bioconda in your publications. Also see: https://hpc.research.uts.edu.au/faq/acknowledgement/.

Then have a browse of the package index here https://bioconda.github.io/conda-package_index.html to get an overview of what is available.

Remember our help for Miniconda is here: https://hpc.research.uts.edu.au/software_general/python/python_miniconda/ You might need to refer to that for some of the steps below.

Install the Bioconda Channel

The instructions below are just following: https://bioconda.github.io/user/install.html#set-up-channels

Activate your Miniconda environment. Then add the channnels in this order.

(base) $ conda config --add channels defaults
(base) $ conda config --add channels bioconda
(base) $ conda config --add channels conda-forge

Searching for the Package you Need

(base)$ conda search cd-hit
Loading channels: done
# Name                       Version           Build  Channel             
cd-hit                         4.6.4               0  bioconda            
cd-hit                         4.6.4               1  bioconda            
cd-hit                         4.6.6               0  bioconda            
cd-hit                         4.6.8               0  bioconda            
cd-hit                         4.6.8      hfc679d8_2  bioconda            
cd-hit                         4.8.1      h2e03b76_4  bioconda            
cd-hit                         4.8.1      h2e03b76_5  bioconda            
cd-hit                         4.8.1      h5b5514e_6  bioconda            
cd-hit                         4.8.1      h8b12597_3  bioconda            
cd-hit                         4.8.1      hdbcaa40_0  bioconda            
cd-hit                         4.8.1      hdbcaa40_1  bioconda            
cd-hit                         4.8.1      hdbcaa40_2  bioconda            
(base)$ 

Installing your Required Package

Remember do not install your computational packages in the base environment. Install them in a virtual environment. Keep that base environment pristine.

Here I list my Python virtual environments:

(base) hpcnode01 playbooks/$ conda env list
# conda environments:
base         *  /shared/homes/XXX/miniconda3
bio-projects    /shared/homes/XXX/miniconda3/envs/bio-projects
geo-stuff       /shared/homes/XXX/miniconda3/envs/geo-stuff
physics         /shared/homes/XXX/miniconda3/envs/physics

I’ll install cd-hit version 4.6.8 into my bio-projects environment:

$ conda install --name bio-projects cd-hit=4.6.8

(base)$ conda install -n bio-projects cd-hit=4.6.8
The following NEW packages will be INSTALLED:
cd-hit    bioconda/linux-64::cd-hit-4.6.8-hfc679d8_2
etc ....

Using your New Package

Just active that environment:

(base)$ conda activate bio-projects
(bio-projects)$ 

Use the package:

(bio-projects)$ cd-hit -h

CD-HIT version 4.7 (built on Jul 13 2018)
Usage: cd-hit [Options] 

Options

   -i	input filename in fasta format, required
   -o	output filename, required
   -c	sequence identity threshold, default 0.9
 	this is the default cd-hit's "global sequence identity" calculated as:
 	number of identical amino acids in alignment
etc... 

When your finished you can deactivate that environment and your base Miniconda environment:

(bio-projects)$ conda deactivate
(base)$

(base)$ conda deactivate
$
This is custom footer