Velvet is a general sequence assembler designed to produce assembly from short, as well as long reads. Running Velvet consists of a sequence of two commands velveth and velvetg. velveth produces a hash table of k-mers, while velvetg constructs the genome assembly. The k-mer length, also known as hash length corresponds to the length, in base pairs, of the words of the reads being hashed.
Velvet has lots of parameters that can be found in its manual. However, the k-mer value is crucial in obtaining optimal assemblies. Higher k-mer values increase the specificity, and lower k-mer values increase the sensitivity.
Velvet supports multiple file formats:
gerald. Velvet also supports different read categories for different sequencing technologies and libraries, e.g.
Each step of Velvet (velveth and velvetg) may be run as its own job. The following pages describe how to run Velvet in this manner on HCC and provide example submit scripts:
In order to test the Velvet (velvet/1.2) performance on Tusker, we used three paired-end input fastq files,
large_2.fastq. Some statistics about the input files and the time and memory resources used by Velvet on Tusker are shown in the table below:
|total # of sequences||total # of bases||total size in MB||velveth used time||velveth used memory||velvetg used time||velvetg used memory||# of used CPUs|
|small_1.fastq||50,121||2,506,050||8.010||~ 0.02 minutes||~ 0.3 GB||~ 0.08 minutes||~ 0.2 GB||8|
|medium_1.fastq||786,742||59,792,392||152||~ 0.4 minutes||~ 1.5 GB||~ 0.8 minutes||~ 0.9 GB||8|
|large_1.fastq||10,174,715||1,027,646,215||3,376||~ 7 minutes||~ 23 GB||~ 45 minutes||~ 51 GB||8|