One of the most frequently used SAMtools command is view. The basic usage of the samtools view is:
$ samtools view input_alignments.[bam|sam] [options] -o output_alignments.[sam|bam]
Running samtools view on Swan with 8 CPUs
, input file input_alignments.sam
with available header (-S), output in BAM format (-b) and output file output_alignments.bam
is shown below:
samtools_view.submit
#!/bin/bash
#SBATCH --job-name=SAMtools_View
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=SAMtools.%J.out
#SBATCH --error=SAMtools.%J.err
module load samtools/1.9
samtools view -bS -@ $SLURM_NTASKS_PER_NODE input_alignments.sam -o output_alignments.bam
The most intensive SAMtools commands (samtools view, samtools sort) are multi-threaded, and therefore using the SAMtools option -@
Sorting BAM files is recommended for further analysis of these files. The BAM file is sorted based on its position in the reference, as determined by its alignment. An example of using 4 CPUs
to sort the input file input_alignments.bam
by the read name follows:
$ samtools sort -n -@ 4 input_alignments.bam -o output_alignments_sorted
The samtools index command creates a new index file that allows fast look-up of the data in a sorted SAM or BAM file.
$ samtools index input_alignments_sorted.bam output_index.bai
The samtools idxstats command prints stats for the BAM index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, number of mapped reads and number of unmapped reads.
$ samtools idxstats input_alignments_sorted.bam
The samtools merge command merges multiple sorted alignments into one output file.
$ samtools merge output_alignments_merge.bam input_alignments_sorted_1.bam input_alignments_sorted_2.bam
The command samtools faidx indexes the reference sequence in fasta format or extracts subsequence from indexed reference sequence.
$ samtools faidx input_reference.fasta
The samtools mpileup command generates file in bcf
or pileup
format for one or multiple BAM files. For each genomic coordinate, the overlapping read bases and indels at that position in the input BAM file are printed.
$ samtools mpileup input_alignments_sorted.bam -o output_alignments.bcf
The samtools tview command starts an interactive text alignment viewer that can be used to visualize how reads are aligned to specific regions of the reference genome.
$ samtools tview input_alignments_sorted.bam