Velvet by itself generates assembled contigs for DNA data. However, using the Oases extension for Velvet, a transcriptome assembly can be produced. Oases is an extension of Velvet for generating de novo assembly for RNA-Seq data. Oases uses the preliminary assembly produced by Velvet as an input, and constructs transcripts.
In order to be able to run Oases, after velveth
, velvetg
needs to be run with the –read_trkg yes
option:
$ velvetg output_directory/ -min_contig_lgth 200 -read_trkg yes
The output_directory/
after velvetg
with -read_trkg
option on contains the following files:
Output directory after Velvetg
$ ls
contigs.fa Graph2 LastGraph Log PreGraph Roadmaps Sequences stats.txt
Oases has a lot of parameters that can be found in its manual. While Velvet is multi-threaded, Oases is not.
A simple SLURM script to run Oases on the Velvet output stored in output_directory/
with minimum transcript length of 200
is shown below:
oases.submit
#!/bin/bash
#SBATCH --job-name=Velvet_Oases
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Oases.%J.out
#SBATCH --error=Oases.%J.err
module load oases/0.2
oases output_directory/ -min_trans_lgth 200
The output_directory/
after Oases contains the following files:
Output directory after Oases
$ ls output_directory/
contig-ordering.txt contigs.fa Graph2 LastGraph Log PreGraph Roadmaps Sequences stats.txt transcripts.fa
Oases produces two additional output files: transcripts.fa
and contig-ordering.txt
. The predicted transcript sequences are found in the fasta file transcripts.fa
.
In order to test the Oases (oases/0.2.8) performance, we used three paired-end input fastq files, small_1.fastq
and small_2.fastq
, medium_1.fastq
and medium_2.fastq
, and large_1.fastq
and large_2.fastq
. Some statistics about the input files and the time and memory resources used by Oases are shown in the table below:
total # of sequences | total # of bases | total size in MB | used time | used memory | # of used CPUs | |
---|---|---|---|---|---|---|
small_1.fastq | 50,121 | 2,506,050 | 8.010 | ~ 0.05 minutes | ~ 0.02 GB | 1 |
small_2.fastq | 50,121 | 2,506,050 | 8.010 | |||
medium_1.fastq | 786,742 | 59,792,392 | 152 | ~ 0.25 minutes | ~ 0.315 GB | 1 |
medium_2.fastq | 786,742 | 59,792,392 | 152 | |||
large_1.fastq | 10,174,715 | 1,027,646,215 | 3,376 | ~ 15 minutes | ~ 30 GB | 1 |
large_2.fastq | 10,174,715 | 1,027,646,215 | 3,376 |