CAP3
CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values.
The basic usage of CAP3 is:
$ cap3 input_reads.fasta [options] > output.txt
input_reads.fasta is an input file of sequence reads in fasta format, and options are optional parameters that can be found by typing:
$ cap3
An example of how to run basic CAP3 SLURM script on Swan is shown below:
#!/bin/bash
#SBATCH --job-name=CAP3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=CAP3.%J.out
#SBATCH --error=CAP3.%J.err
module load cap3/122107
cap3 input_reads.fasta > output.txt
CAP3 is single threaded program, and therefore both #SBATCH --nodes and #SBATCH --ntasks-per-node are set to 1.
CAP3 Output¶
CAP3 returns few output files, input_reads.fasta.cap.singlets, input_reads.fasta.cap.contigs, input_reads.fasta.cap.contigs.links, input_reads.fasta.cap.qual, input_reads.fasta.cap.ace, input_reads.fasta.cap.info.
The consensus fasta sequences are saved in the file input_reads.fasta.cap.contigs, while the reads that are not used in the assembly are stored in the fasta file input_reads.fasta.cap.singlets.
Useful Information¶
In order to test the CAP3 (cap3/122107) performance on Swan, we created separately three nucleotide datasets, small.fasta, medium.fasta and large.fasta. Some statistics about the input datasets and the time and memory resources used by CAP3 on Swan are shown in the table below:
| total # of sequences | total # of bases | total size in MB | used time | used memory | # of used CPUs | |
|---|---|---|---|---|---|---|
| small.fasta | 41,715 | 35,581,740 | 37.627 | ~ 1.6 hours | ~ 1.5 GB | 1 |
| medium.fasta | 110,478 | 147,543,113 | 149 | ~ 2 hours | ~ 5 GB | 1 |
| large.fasta | 592,593 | 827,629,204 | 836 | ~ 12 hours | ~ 28 GB | 1 |