Many of our users are performing computations in the Bioinformatics field. Python has a huge number of packages for this in the Bioconda channel. A “channel” is a repository for a collection of packages. So first you will need to install this channel.
First see: https://bioconda.github.io You should read this as it covers what Bioconda is and how to cite Bioconda in your publications. Also see: https://hpc.research.uts.edu.au/faq/acknowledgement/.
Then have a browse of the package index here https://bioconda.github.io/conda-package_index.html to get an overview of what is available.
Remember our help for Miniconda is here: https://hpc.research.uts.edu.au/software_general/python/python_miniconda/ You might need to refer to that for some of the steps below.
The instructions below are just following: https://bioconda.github.io/user/install.html#set-up-channels
Activate your Miniconda environment. Then add the channnels in this order.
(base) $ conda config --add channels defaults (base) $ conda config --add channels bioconda (base) $ conda config --add channels conda-forge
(base)$ conda search cd-hit Loading channels: done # Name Version Build Channel cd-hit 4.6.4 0 bioconda cd-hit 4.6.4 1 bioconda cd-hit 4.6.6 0 bioconda cd-hit 4.6.8 0 bioconda cd-hit 4.6.8 hfc679d8_2 bioconda cd-hit 4.8.1 h2e03b76_4 bioconda cd-hit 4.8.1 h2e03b76_5 bioconda cd-hit 4.8.1 h5b5514e_6 bioconda cd-hit 4.8.1 h8b12597_3 bioconda cd-hit 4.8.1 hdbcaa40_0 bioconda cd-hit 4.8.1 hdbcaa40_1 bioconda cd-hit 4.8.1 hdbcaa40_2 bioconda (base)$
Remember do not install your computational packages in the base environment. Install them in a virtual environment. Keep that base environment pristine.
Here I list my Python virtual environments:
(base) hpcnode01 playbooks/$ conda env list # conda environments: base * /shared/homes/XXX/miniconda3 bio-projects /shared/homes/XXX/miniconda3/envs/bio-projects geo-stuff /shared/homes/XXX/miniconda3/envs/geo-stuff physics /shared/homes/XXX/miniconda3/envs/physics
I’ll install cd-hit version 4.6.8 into my bio-projects environment:
$ conda install --name bio-projects cd-hit=4.6.8 (base)$ conda install -n bio-projects cd-hit=4.6.8 The following NEW packages will be INSTALLED: cd-hit bioconda/linux-64::cd-hit-4.6.8-hfc679d8_2 etc ....
Just active that environment:
(base)$ conda activate bio-projects (bio-projects)$
Use the package:
(bio-projects)$ cd-hit -h CD-HIT version 4.7 (built on Jul 13 2018) Usage: cd-hit [Options] Options -i input filename in fasta format, required -o output filename, required -c sequence identity threshold, default 0.9 this is the default cd-hit's "global sequence identity" calculated as: number of identical amino acids in alignment etc...
When your finished you can deactivate that environment and your base Miniconda environment:
(bio-projects)$ conda deactivate (base)$ (base)$ conda deactivate $