Biodata Module

HCC hosts multiple databases (BLAST, KEGG, PANTHER, InterProScan), genome files, short read aligned indices etc. on both Tusker and Crane.
In order to use these resources, the “biodata” module needs to be loaded first.
For how to load module, please check Module Commands.

Loading the “biodata” module will pre-set many environment variables, but most likely you will only need a subset of them. Environment variables can be used in your command or script by prefixing $ to the name.

The major environment variables are:
$DATA - main directory
$BLAST - Directory containing all available BLAST (nucleotide and protein) databases
$KEGG - KEGG database main entry point (requires license)
$PANTHER - PANTHER database main entry point (latest)
$IPR - InterProScan database main entry point (latest)
$GENOMES - Directory containing all available genomes (multiple sources, builds possible
$INDICES - Directory containing indices for bowtie, bowtie2, bwa for all available genomes
$UNIPROT - Directory containing latest release of full UniProt database

In order to check what genomes are available, you can type:

$ ls $GENOMES

In order to check what BLAST databases are available, you can just type:

$ ls $BLAST

An example of how to run Bowtie2 local alignment on Crane utilizing the default Horse, Equus caballus index (BOWTIE2_HORSE) with paired-end fasta files and 8 CPUs is shown below:

bowtie2_alignment.submit
#!/bin/sh
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err

module load bowtie/2.2
module load biodata

bowtie2 -x $BOWTIE2_HORSE -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE

An example of BLAST run against the non-redundant nucleotide database available on Crane is provided below:

blastn_alignment.submit
#!/bin/sh
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err

module load blast/2.7
module load biodata
cp $BLAST/nt.* /scratch
cp input_reads.fasta /scratch

blastn -db /scratch/nt -query /scratch/input_reads.fasta -out /scratch/blast_nucleotide.results
cp /scratch/blast_nucleotide.results .

The organisms and their appropriate environmental variables for all genomes and chromosome files, as well as for short read aligned indices are shown on the link below:
Organisms