Bowtie is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.
The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command, bowtie-build is:
$ bowtie-build input_reference.fasta index_prefix
After the index of the reference genome is generated, the next step is to align the reads. The basic usage of bowtie is:
$ bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]] [options]
Bowtie supports both single-end (input_reads.[fasta|fastq]
) and paired-end (input_reads_pair_1.[fasta|fastq]
, input_reads_pair_2.[fasta|fastq]
) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: -q (fastq files), -f (fasta files), -r (raw one-sequence per line), or -c (sequences given on command line).
An example of how to run Bowtie alignment on Swan with single-end fastq file and 8 CPUs
is shown below:
bowtie_alignment.submit
#!/bin/bash
#SBATCH --job-name=Bowtie
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie.%J.out
#SBATCH --error=Bowtie.%J.err
module load bowtie/1.1
bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alignments.sam
Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.