BLAST

BLAST is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.

The following pages, Create Local BLAST Database and Running BLAST Alignment describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.

Useful Information

In order to test the BLAST (blast/2.2) performance on Tusker, we aligned three nucleotide query datasets, small.fasta, medium.fasta and large.fasta, against the non-redundant nucleotide nt.fasta database from NCBI. Some statistics about the query datasets and the time and memory resources used for the alignment are shown on the table below:

total # of sequences total # of bases total size in MB used time used memory # of used CPUs
small.fasta 41,715 35,581,740 37.627 ~ 2 hours ~ 23 GB 8
medium.fasta 110,478 147,543,113 149 ~ 4 hours ~ 24 GB 8
large.fasta 592,593 827,629,204 836 ~ 15 hours ~ 47 GB 8