Running SAMtools Commands

SAMtools View

One of the most frequently used SAMtools command is view. The basic usage of the samtools view is:

$ samtools view input_alignments.[bam|sam] [options] -o output_alignments.[sam|bam]
where input_alignments.[bam|sam] is the input file with the alignments in BAM/SAM format, and output_alignments.[sam|bam] file is the converted file into SAM or BAM format respectively.

Running samtools view on Tusker with 8 CPUs, input file input_alignments.sam with available header (-S), output in BAM format (-b) and output file output_alignments.bam is shown below:

#SBATCH --job-name=SAMtools_View
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=SAMtools.%J.out
#SBATCH --error=SAMtools.%J.err

module load samtools/1.9

samtools view -bS -@ $SLURM_NTASKS_PER_NODE input_alignments.sam -o output_alignments.bam

The most intensive SAMtools commands (samtools view, samtools sort) are multi-threaded, and therefore using the SAMtools option -@ is recommended.

SAMtools Sort

Sorting BAM files is recommended for further analysis of these files. The BAM file is sorted based on its position in the reference, as determined by its alignment. An example of using 4 CPUs to sort the input file input_alignments.bam by the read name follows:

$ samtools sort -n -@ 4 input_alignments.bam output_alignments_sorted

SAMtools Index

The samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.

$ samtools index input_alignments_sorted.bam output_index.bai

SAMtools Idxstats

The samtools idxstats command prints stats for the BAM index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.

$ samtools idxstats input_alignments_sorted.bam

SAMtools Merge

The samtools merge command merges multiple sorted alignments into one output file.

$ samtools merge output_alignments_merge.bam input_alignments_sorted_1.bam input_alignments_sorted_2.bam

SAMtools Faidx

The command samtools faidx indexes the reference sequence in fasta format or extracts subsequence from indexed reference sequence.

$ samtools faidx input_reference.fasta

SAMtools Mpileup

The samtools mpileup command generates file in bcf or pileup format for one or multiple BAM files. For each genomic coordinate, the overlapping read bases and indels at that position in the input BAM file are printed.

$ samtools mpileup input_alignments_sorted.bam > output_alignments.bcf

SAMtools View

The samtools tview command starts an interactive text alignment viewer that can be used to visualize how reads are aligned to specific regions of the reference genome.

$ samtools tview input_alignments_sorted.bam