CAP3

CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values.

The basic usage of CAP3 is:

$ cap3 input_reads.fasta [options] > output.txt

where input_reads.fasta is an input file of sequence reads in fasta format, and options are optional parameters that can be found by typing:

$ cap3

An example of how to run basic CAP3 SLURM script on Swan is shown below:

#!/bin/bash
#SBATCH --job-name=CAP3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=CAP3.%J.out
#SBATCH --error=CAP3.%J.err

module load cap3/122107

cap3 input_reads.fasta > output.txt

CAP3 is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.

CAP3 Output¶

CAP3 returns few output files, input_reads.fasta.cap.singlets, input_reads.fasta.cap.contigs, input_reads.fasta.cap.contigs.links, input_reads.fasta.cap.qual, input_reads.fasta.cap.ace, input_reads.fasta.cap.info.

The consensus fasta sequences are saved in the file input_reads.fasta.cap.contigs, while the reads that are not used in the assembly are stored in the fasta file input_reads.fasta.cap.singlets.

Useful Information¶

In order to test the CAP3 (cap3/122107) performance on Swan, we created separately three nucleotide datasets, small.fasta, medium.fasta and large.fasta. Some statistics about the input datasets and the time and memory resources used by CAP3 on Swan are shown in the table below:

	total # of sequences	total # of bases	total size in MB	used time	used memory	# of used CPUs
small.fasta	41,715	35,581,740	37.627	~ 1.6 hours	~ 1.5 GB	1
medium.fasta	110,478	147,543,113	149	~ 2 hours	~ 5 GB	1
large.fasta	592,593	827,629,204	836	~ 12 hours	~ 28 GB	1