CAP3

CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values.

The basic usage of CAP3 is:

$ cap3 input_reads.fasta [options] > output.txt
where input_reads.fasta is an input file of sequence reads in fasta format, and options are optional parameters that can be found by typing:
$ cap3

An example of how to run basic CAP3 SLURM script on Swan is shown below:

cap3.submit
#!/bin/bash
#SBATCH --job-name=CAP3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=CAP3.%J.out
#SBATCH --error=CAP3.%J.err

module load cap3/122107

cap3 input_reads.fasta > output.txt

CAP3 is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.

CAP3 Output

CAP3 returns few output files, input_reads.fasta.cap.singlets, input_reads.fasta.cap.contigs, input_reads.fasta.cap.contigs.links, input_reads.fasta.cap.qual, input_reads.fasta.cap.ace, input_reads.fasta.cap.info.

The consensus fasta sequences are saved in the file input_reads.fasta.cap.contigs, while the reads that are not used in the assembly are stored in the fasta file input_reads.fasta.cap.singlets.

Useful Information

In order to test the CAP3 (cap3/122107) performance on Swan, we created separately three nucleotide datasets, small.fasta, medium.fasta and large.fasta. Some statistics about the input datasets and the time and memory resources used by CAP3 on Swan are shown in the table below:

total # of sequences total # of bases total size in MB used time used memory # of used CPUs
small.fasta 41,715 35,581,740 37.627 ~ 1.6 hours ~ 1.5 GB 1
medium.fasta 110,478 147,543,113 149 ~ 2 hours ~ 5 GB 1
large.fasta 592,593 827,629,204 836 ~ 12 hours ~ 28 GB 1