Bowtie

Bowtie is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.

The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command, bowtie-build is:

$ bowtie-build input_reference.fasta index_prefix
where input_reference.fasta is an input file of sequence reads in fasta format, and index_prefix is the prefix of the generated index files.

After the index of the reference genome is generated, the next step is to align the reads. The basic usage of bowtie is:

$ bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]] [options]
where index_prefix is the generated index using the bowtie-build command, and options are optional parameters that can be found in the Bowtie manual.

Bowtie supports both single-end (input_reads.[fasta|fastq]) and paired-end (input_reads_pair_1.[fasta|fastq], input_reads_pair_2.[fasta|fastq]) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: -q (fastq files), -f (fasta files), -r (raw one-sequence per line), or -c (sequences given on command line).

An example of how to run Bowtie alignment on Swan with single-end fastq file and 8 CPUs is shown below:

bowtie_alignment.submit
#!/bin/bash
#SBATCH --job-name=Bowtie
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie.%J.out
#SBATCH --error=Bowtie.%J.err

module load bowtie/1.1

bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alignments.sam

Bowtie Output

Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.