Oases

Velvet by itself generates assembled contigs for DNA data. However, using the Oases extension for Velvet, a transcriptome assembly can be produced. Oases is an extension of Velvet for generating de novo assembly for RNA-Seq data. Oases uses the preliminary assembly produced by Velvet as an input, and constructs transcripts.

In order to be able to run Oases, after velveth, velvetg needs to be run with the –read_trkg yes option:

$ velvetg output_directory/ -min_contig_lgth 200 -read_trkg yes

The output_directory/ after velvetg with -read_trkg option on contains the following files:

Output directory after Velvetg
$ ls
contigs.fa  Graph2  LastGraph  Log  PreGraph  Roadmaps  Sequences  stats.txt

Oases has a lot of parameters that can be found in its manual. While Velvet is multi-threaded, Oases is not.

A simple SLURM script to run Oases on the Velvet output stored in output_directory/ with minimum transcript length of 200 is shown below:

oases.submit
#!/bin/sh
#SBATCH --job-name=Velvet_Oases
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=168:00:00
#SBATCH --mem=10gb
#SBATCH --output=Oases.%J.out
#SBATCH --error=Oases.%J.err

module load oases/0.2

oases output_directory/ -min_trans_lgth 200

Oases Output

The output_directory/ after Oases contains the following files:

Output directory after Oases
$ ls output_directory/
contig-ordering.txt  contigs.fa  Graph2  LastGraph  Log  PreGraph  Roadmaps  Sequences  stats.txt  transcripts.fa

Oases produces two additional output files: transcripts.fa and contig-ordering.txt. The predicted transcript sequences are found in the fasta file transcripts.fa.

Useful Information

In order to test the Oases (oases/0.2.8) performance on Tusker, we used three paired-end input fastq files, small_1.fastq and small_2.fastq, medium_1.fastq and medium_2.fastq, and large_1.fastq and large_2.fastq. Some statistics about the input files and the time and memory resources used by Oases on Tusker are shown in the table below:

total # of sequences total # of bases total size in MB used time used memory # of used CPUs
small_1.fastq 50,121 2,506,050 8.010 ~ 0.05 minutes ~ 0.02 GB 1
small_2.fastq 50,121 2,506,050 8.010
medium_1.fastq 786,742 59,792,392 152 ~ 0.25 minutes ~ 0.315 GB 1
medium_2.fastq 786,742 59,792,392 152
large_1.fastq 10,174,715 1,027,646,215 3,376 ~ 15 minutes ~ 30 GB 1
large_2.fastq 10,174,715 1,027,646,215 3,376