Velvet

Velvet is a general sequence assembler designed to produce assembly from short, as well as long reads. Running Velvet consists of a sequence of two commands velveth and velvetg. velveth produces a hash table of k-mers, while velvetg constructs the genome assembly. The k-mer length, also known as hash length corresponds to the length, in base pairs, of the words of the reads being hashed.

Velvet has lots of parameters that can be found in its manual. However, the k-mer value is crucial in obtaining optimal assemblies. Higher k-mer values increase the specificity, and lower k-mer values increase the sensitivity.

Velvet supports multiple file formats: fasta, fastq, fasta.gz, fastq.gz, sam, bam, eland, gerald. Velvet also supports different read categories for different sequencing technologies and libraries, e.g. short, shortPaired, short2, shortPaired2, long, longPaired.

Each step of Velvet (velveth and velvetg) may be run as its own job. The following pages describe how to run Velvet in this manner on HCC and provide example submit scripts:

Useful Information

In order to test the Velvet (velvet/1.2) performance on Tusker, we used three paired-end input fastq files, small_1.fastq and small_2.fastq, medium_1.fastq and medium_2.fastq, and large_1.fastq and large_2.fastq. Some statistics about the input files and the time and memory resources used by Velvet on Tusker are shown in the table below:

total # of sequences total # of bases total size in MB velveth used time velveth used memory velvetg used time velvetg used memory # of used CPUs
small_1.fastq 50,121 2,506,050 8.010 ~ 0.02 minutes ~ 0.3 GB ~ 0.08 minutes ~ 0.2 GB 8
small_2.fastq 50,121 2,506,050 8.010
medium_1.fastq 786,742 59,792,392 152 ~ 0.4 minutes ~ 1.5 GB ~ 0.8 minutes ~ 0.9 GB 8
medium_2.fastq 786,742 59,792,392 152
large_1.fastq 10,174,715 1,027,646,215 3,376 ~ 7 minutes ~ 23 GB ~ 45 minutes ~ 51 GB 8
large_2.fastq 10,174,715 1,027,646,215 3,376