Running BWA Commands
BWA Index¶
The first step of using BWA is to make an index of the reference genome in fasta format. The basic usage of the bwa index is:
$ bwa index [-a bwtsw|is] input_reference.fasta index_prefix
BWA Mem¶
The bwa mem algorithm is one of the three algorithms provided by BWA. It performs local alignment and produces alignments for different part of the query sequence. The basic usage of bwa mem is:
$ bwa mem index_prefix [input_reads.fastq|input_reads_pair_1.fastq input_reads_pair_2.fastq] [options]
Simple SLURM script for running bwa mem on Swan with paired-end fastq input data, index_prefix
as reference genome index, SAM output file and 8 CPUs
is shown below:
#!/bin/bash
#SBATCH --job-name=Bwa_Mem
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=BwaMem.%J.out
#SBATCH --error=BwaMem.%J.err
module load bwa/0.7
bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > bwa_mem_alignments.sam
BWA Bwasw¶
The bwa bwasw algorithm is another algorithm provided by BWA. For input files with single-end reads it aligns the query sequences. For input files with paired-ends reads it performs paired-end alignment that only works for Illumina reads.
An example of bwa bwasw for single-end input file input-reads.fasta
in fasta format and output file bwa_bwasw_alignments.sam
where the alignments are stored, is shown below:
$ bwa bwasw index_prefix input_reads.fasta -t $SLURM_NTASKS_PER_NODE > bwa_bwasw_alignments.sam
BWA Aln¶
The third BWA algorithm, bwa aln, aligns the input file of sequence data to the reference genome. In addition, there is an example of running bwa aln with single-end input_reads.fasta
input file and 8 CPUs
:
$ bwa aln index_prefix input_reads.fasta -0 -t $SLURM_NTASKS_PER_NODE > bwa_aln_alignments.sai
BWA Samse and BWA Sampe¶
The command bwa samse uses the bwa_aln_alignments.sai
output from bwa aln in order to generate SAM file from the alignments for single-end reads.
$ bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments.sai input_reads.fasta output31.preArc
The command bwa sampe uses the bwa_aln_alignments.sai
output form bwa aln in order to generate SAM file from the alignments for paired-end reads.
$ bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments_pair_1.sai bwa_aln_alignments_pair_2.sai input_reads_pair_1.fasta input_reads_pair_2.fasta
BWA Fastmap¶
The command bwa fastmap identifies and outputs super-maximal exact matches (SMEMs). The basic usage of bwa fastmap is:
$ bwa fastmap index_prefix input_reads.fasta > bwa_fastmap.matches
BWA Pemerge¶
The command bwa pemerge merges overlapping paired ends and can print either only the merged reads or the unmerged ones. An example of bwa pemerge of input_reads_pair_1.fastq
and input_reads_pair_2.fastq
with 8 CPUs
and output file output_reads_merged.fastq
that contains only the merged reads is shown below:
$ bwa pemerge -m input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > output_reads_merged.fastq
BWA Fa2pac¶
The command bwa fa2pac converts fasta to pac files. The general usage of bwa fa2pac is:
$ bwa fa2pac input_reads.fasta pac_prefix
BWA Pac2bwt and BWA Pac2bwtgen¶
The commands bwa pac2bwt and bwa pac2bwtgen convert pac to bwt files.
$ bwa pac2bwt input_reads.pac output_reads.bwt
$ bwa pac2bwtgen input_reads.pac output_reads.bwt
BWA Bwtupdate¶
The command bwa bwtupdate updates bwt files to the new format. The general usage of bwa bwtupdate is:
$ bwa bwtupdate input_reads.bwt
BWA Bwt2sa¶
The command bwa bwt2sa generates sa files from bwt and Occ files. The basic usage of bwa bwt2sa is:
$ bwa bwt2sa input_reads.bwt output_reads.sa
Useful Information¶
In order to test the scalability of BWA (bwa/0.7) on Swan, we used two paired-end input fastq files, large_1.fastq
and large_2.fastq
, and one single-end input fasta file, large.fasta
. Some statistics about the input files and the time and memory resources used by bwa mem are shown on the table below:
total # of sequences | total size in MB | # of used CPUs | used time for 4 CPUs | used memory for 4 CPUs | # of used CPUs | used time for 8 CPUs | used memory for 8 CPUs | # of used CPUs | used time for 16 CPUs | used memory for 16 CPUs | |
---|---|---|---|---|---|---|---|---|---|---|---|
large_1.fastq | 10,174,715 | 3,376 | 4 | ~ 35 minutes | ~ 12 GB | 8 | ~ 18.5 minutes | ~ 18 GB | 16 | ~ 10 minutes | ~ 19 GB |
large_2.fastq | 10,174,715 | 3,376 | |||||||||
large.fasta | 592,593 | 836 | 4 | ~ 5.5 minutes | ~ 3 GB | 8 | ~ 3 minutes | ~ 4 GB | 16 | ~ 2 minutes | ~ 6.2 GB |