Trinity is a method for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines four independent software modules:
Assembly. All these modules can be applied sequentially to process large RNA-Seq datasets.
The basic usage of Trinity is:
$ Trinity --seqType [fa|fq] --max_memory <maximum_memory> --left input_reads_pair_1.[fa|fq] --right input_reads_pair_2.[fa|fq] [options]
Additional Trinity options can be found in the Trinity website, or by typing:
Running the Trinity pipeline from beginning to end on large datasets may exceed the walltime limit for a single job. Therefore, Trinity provides a mechanism to run the workflow in four separate steps, where each step resumes from the previous one. The same Trinity command and options are run for each step, with an additional option that is included for the different steps. On the last step, the Trinity command is run as normal.
Trinity [options] --no_run_inchworm
Trinity [options] --no_run_chrysalis
Trinity [options] --no_distributed_trinity_exec
Each step may be run as its own job, providing a workaround for the single job walltime limit. To see how to run each step of Trinity as a single job under the SLURM scheduler on HCC, please check:
In order to test the Trinity (trinity/r2014-04-13p1) performance, we used three paired-end input fastq files,
large_2.fastq. Some statistics about the input files and the time and memory resources used by Trinity are shown in the table below:
|total # of sequences||total # of bases||total size in MB||Trinity step 1 used time||Trinity step 1 used memory||Trinity step 2 used time||Trinity step 2 used memory||Trinity step 3 used time||Trinity step 3 used memory||Trinity step 4 used time||Trinity step 4 used memory||# of used CPUs|
|small_1.fastq||50,121||2,506,050||8.010||~ 1 minute||~ 35 GB||~ 0.01 hours||~ 0.6 GB||~ 0.2 minutes||~ 0.07 GB||~ 0.008 hours||~ 0.8 GB||8|
|medium_1.fastq||786,742||59,792,392||152||~ 3 minutes||~ 68 GB||~ 0.1 hours||~ 3 GB||~ 0.8 minutes||~ 0.6 GB||~ 0.3 hours||~ 5 GB||8|
|large_1.fastq||10,174,715||1,027,646,215||3,376||~ 58 minutes||~ 80 GB||~ 5 hours||~ 30 GB||~ 35 minutes||~ 8 GB||~ 13 hours||~ 30 GB||8|
The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have 1GB of RAM per 1M ~76 base Illumina paired-end reads.