Trinity

Trinity is a method for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly. All these modules can be applied sequentially to process large RNA-Seq datasets.

The basic usage of Trinity is:

$ Trinity --seqType [fa|fq] --JM <jellyfish_memory> --left input_reads_pair_1.[fa|fq] --right input_reads_pair_2.[fa|fq] [options]
where input_reads_pair_1.[fa|fq] and input_reads_pair_2.[fa|fq] are the input paired-end files of sequence reads in fasta/fastq format, and –seqType is the type of these input reads. The option –JM defines the number of GB of system memory required for k-mer counting by jellyfish.

Additional Trinity options can be found in the Trinity website, or by typing:

$ Trinity

Running the Trinity pipeline from beginning to end on large datasets may exceed the walltime limit for a single job. Therefore, Trinity provides a mechanism to run the workflow in four separate steps, where each step resumes from the previous one. The same Trinity command and options are run for each step, with an additional option that is included for the different steps. On the last step, the Trinity command is run as normal.

Step 1 Options
Trinity.pl [options] --no_run_chrysalis
Step 2 Options
Trinity.pl [options] --no_run_quantifygraph
Step 3 Options
Trinity.pl [options] --no_run_butterfly
Step 4 Options
Trinity.pl [options]

Each step may be run as its own job, providing a workaround for the single job walltime limit. To see how to run each step of Trinity as a single job under the SLURM scheduler on HCC, please check:

Useful Information

In order to test the Trinity (trinity/r2014-04-13p1) performance on Tusker, we used three paired-end input fastq files, small_1.fastq and small_2.fastq, medium_1.fastq and medium_2.fastq, and large_1.fastq and large_2.fastq. Some statistics about the input files and the time and memory resources used by Trinity on Tusker are shown in the table below:

total # of sequences total # of bases total size in MB Trinity step 1 used time Trinity step 1 used memory Trinity step 2 used time Trinity step 2 used memory Trinity step 3 used time Trinity step 3 used memory Trinity step 4 used time Trinity step 4 used memory # of used CPUs
small_1.fastq 50,121 2,506,050 8.010 ~ 1 minute ~ 35 GB ~ 0.01 hours ~ 0.6 GB ~ 0.2 minutes ~ 0.07 GB ~ 0.008 hours ~ 0.8 GB 8
small_2.fastq 50,121 2,506,050 8.010
medium_1.fastq 786,742 59,792,392 152 ~ 3 minutes ~ 68 GB ~ 0.1 hours ~ 3 GB ~ 0.8 minutes ~ 0.6 GB ~ 0.3 hours ~ 5 GB 8
medium_2.fastq 786,742 59,792,392 152
large_1.fastq 10,174,715 1,027,646,215 3,376 ~ 58 minutes ~ 80 GB ~ 5 hours ~ 30 GB ~ 35 minutes ~ 8 GB ~ 13 hours ~ 30 GB 8
large_2.fastq 10,174,715 1,027,646,215 3,376

The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have 1GB of RAM per 1M ~76 base Illumina paired-end reads.