Basic BLAST has the following commands:
The basic usage of blastn is:
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments [options]
Additional parameters can be found in the BLAST manual, or by typing:
$ blastn -help
These BLAST alignment commands are multi-threaded, and therefore using the BLAST option -num_threads
HCC hosts multiple BLAST databases and indices on Swan. In order to use these resources, the “biodata” module needs to be loaded first. The $BLAST variable contains the following currently available databases:
If you want to create and use a BLAST database that is not mentioned above, check Create Local BLAST Database. If you want a database to be added to the “biodata” module, please send a request to bcrf-support@unl.edu.
To access the older format of BLAST databases that work with BLAST+ 2.9 and lower, please use the variable BLAST_V4. The variable BLAST points to the directory with the new version 5 of the nucleotide and protein databases required for BLAST+ 2.10 and higher.
Basic SLURM example of nucleotide BLAST run against the non-redundant nt BLAST database with 8 CPUs
is provided below. When running BLAST alignment, it is recommended to first copy the query and database files to the /scratch/ directory of the worker node. Moreover, the BLAST output is also saved in this directory (/scratch/blastn_output.alignments). After BLAST finishes, the output file is copied from the worker node to your current work directory.
This example will first copy the database and your input file to faster local storage called “scratch”, assuming that the input file exists in your current directory. This can greatly improve performance!
blastn_alignment.submit
#!/bin/bash
#SBATCH --job-name=BlastN
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastN.%J.out
#SBATCH --error=BlastN.%J.err
module load blast/2.10
module load biodata/1.0
# Be sure to use a directory under $WORK for your job
cp $BLAST/nt.* /scratch/
cp input_reads.fasta /scratch/
blastn -query /scratch/input_reads.fasta -db /scratch/nt -out /scratch/blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE
cp /scratch/blastn_output.alignments .
One important BLAST parameter is the e-value threshold that changes the number of hits returned by showing only those with value lower than the given. To show the hits with e-value lower than 1e-10, modify the given script as follows:
$ blastn -query input_reads.fasta -db input_reads_db -out blastn_output.alignments -num_threads $SLURM_NTASKS_PER_NODE -evalue 1e-10
The default BLAST output is in pairwise format. However, BLAST’s parameter -outfmt supports output in different formats that are easier for parsing.
Basic SLURM example of protein BLAST run against the non-redundant nr BLAST database with tabular output format and 8 CPUs
is shown below. Similarly as before, the query and database files are copied to the /scratch/ directory. The BLAST output is also saved in this directory (/scratch/blastx_output.alignments). After BLAST finishes, the output file is copied from the worker node to your current work directory.
This example will first copy the database and your input file to faster local storage called “scratch”, assuming that the input file exists in your current directory. This can greatly improve performance!
blastx_alignment.submit
#!/bin/bash
#SBATCH --job-name=BlastX
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=20gb
#SBATCH --output=BlastX.%J.out
#SBATCH --error=BlastX.%J.err
module load blast/2.10
module load biodata/1.0
# Be sure to use a directory under $WORK for your job
cp $BLAST/nr.* /scratch/
cp input_reads.fasta /scratch/
blastx -query /scratch/input_reads.fasta -db /scratch/nr -outfmt 6 -out /scratch/blastx_output.alignments -num_threads $SLURM_NTASKS_PER_NODE
cp /scratch/blastx_output.alignments .