BLAST is a local alignment tool that finds similarity between sequences. This tool compares nucleotide or protein sequences to sequence databases, and calculates significance of matches. Sometimes these input sequences are large and using the command-line BLAST is required.
The following pages, Create Local BLAST Database and Running BLAST Alignment describe how to run some of the most common BLAST executables as a single job using the SLURM scheduler on HCC.
In order to test the BLAST (blast/2.2) performance on Swan, we aligned three nucleotide query datasets, small.fasta
, medium.fasta
and large.fasta
, against the non-redundant nucleotide nt.fasta database from NCBI. Some statistics about the query datasets and the time and memory resources used for the alignment are shown on the table below:
total # of sequences | total # of bases | total size in MB | used time | used memory | # of used CPUs | |
---|---|---|---|---|---|---|
small.fasta | 41,715 | 35,581,740 | 37.627 | ~ 2 hours | ~ 23 GB | 8 |
medium.fasta | 110,478 | 147,543,113 | 149 | ~ 4 hours | ~ 24 GB | 8 |
large.fasta | 592,593 | 827,629,204 | 836 | ~ 15 hours | ~ 47 GB | 8 |