Bowtie2

Bowtie2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Although Bowtie and Bowtie2 are both fast read aligners, there are few main differences between them:

  • Bowtie2 supports gapped alignment with affine gap penalties, without restrictions on the number of gaps and gap lengths.
  • Bowtie supports reads longer than 50bp and is generally faster, more sensitive, and uses less memory than Bowtie.
  • Bowtie support only end-to-end alignments, while Bowtie2 supports both end-to-end and local alignment.
  • Bowtie has an upper limit on read length of around 1,000 bp, while Bowtie2 does not have any.
  • Bowtie2’s paired-end alignment is more flexible that Bowtie’s.
  • Bowtie2 does not align colorspace reads.
  • Bowtie and Bowtie2 indices are not compatible.

Same as Bowtie, the first and basic step of running Bowtie2 is to build Bowtie2 index from a reference genome sequence. The basic usage of the command bowtie2-build is:

$ bowtie2-build -f input_reference.fasta index_prefix
where input_reference.fasta is an input file of sequence reads in fasta format, and index_prefix is the prefix of the generated index files. Beside the option -f that is used when the reference input file is a fasta file, the option -c can be used when the reference sequences are given on the command line.

The command bowtie2 takes a Bowtie2 index and set of sequencing read files and outputs set of alignments in SAM format. The general bowtie2 usage is:

$ bowtie2 -x index_prefix [-q|--qseq|-f|-r|-c] [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | -U input_reads.[fasta|fastq]] -S bowtie2_alignments.sam [options]
where index_prefix is the generated index using the bowtie2-build command, and options are optional parameters that can be found in the Bowtie2 manual. Bowtie2 supports both single-end (input_reads.[fasta|fastq]) and paired-end (input_reads_pair_1.[fasta|fastq], input_reads_pair_2.[fasta|fastq]) files in fasta or fastq format. The format of the input files also needs to be specified by using one of the following flags: -q (fastq files), –qseq (Illumina’s qseq format), -f (fasta files), -r (raw one sequence per line), or -c (sequences given on command line).

An example of how to run Bowtie2 local alignment on Swan with paired-end fasta files and 8 CPUs is shown below:

bowtie2_alignment.submit
#!/bin/bash
#SBATCH --job-name=Bowtie2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Bowtie2.%J.out
#SBATCH --error=Bowtie2.%J.err

module load bowtie/2.3

bowtie2 -x index_prefix -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE

Bowtie2 Output

Bowtie2 outputs alignments in SAM format that can further be manipulated with different tools, like SAMtools and GATK. Each line from the file describes an alignment and is a collection of at least 12 fields separated by tabs. Detailed information about Bowtie2 output fields can be found in the Bowtie2 manual.