Trinity

Trinity is a method for efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines four independent software modules: Normalization, Inchworm, Chrysalis and Assembly. All these modules can be applied sequentially to process large RNA-Seq datasets.

The basic usage of Trinity is:

$ Trinity --seqType [fa|fq] --max_memory <maximum_memory> --left input_reads_pair_1.[fa|fq] --right input_reads_pair_2.[fa|fq] [options]

where input_reads_pair_1.[fa|fq] and input_reads_pair_2.[fa|fq] are the input paired-end files of sequence reads in fasta/fastq format, and --seqType is the type of these input reads. The option --max_memory specifies the maximum memory to use with Trinity.

Note

Trinity produces many intermediate files that can affect the file system. To avoid any issues, please copy all the input data to the faster local storage called "scratch", store the output in "scratch" and finally copy all the needed output files from "scratch" to /work. The "scratch" directories are unique per job and are deleted when the job finishes. This can greatly improve performance!

Additional Trinity options can be found in the Trinity website, or by typing:

$ Trinity

Running the Trinity pipeline from beginning to end on large datasets may exceed the walltime limit for a single job. Therefore, Trinity provides a mechanism to run the workflow in four separate steps, where each step resumes from the previous one. The same Trinity command and options are run for each step, with an additional option that is included for the different steps. On the last step, the Trinity command is run as normal.

Step 1 Options

Trinity [options] --no_run_inchworm

Step 2 Options

Trinity [options] --no_run_chrysalis

Step 3 Options

Trinity [options] --no_distributed_trinity_exec

Step 4 Options

Trinity [options]

Each step may be run as its own job, providing a workaround for the single job walltime limit. To see how to run each step of Trinity as a single job under the SLURM scheduler on HCC, please check:

Running Trinity in Multiple Steps ¶

Description: How to run Trinity in multiple steps on HCC resources

Useful Information¶

In order to test the Trinity (trinity/r2014-04-13p1) performance, we used three paired-end input fastq files, small_1.fastq and small_2.fastq, medium_1.fastq and medium_2.fastq, and large_1.fastq and large_2.fastq. Some statistics about the input files and the time and memory resources used by Trinity are shown in the table below:

	total # of sequences	total # of bases	total size in MB	Trinity step 1 used time	Trinity step 1 used memory	Trinity step 2 used time	Trinity step 2 used memory	Trinity step 3 used time	Trinity step 3 used memory	Trinity step 4 used time	Trinity step 4 used memory	# of used CPUs
small_1.fastq	50,121	2,506,050	8.010	~ 1 minute	~ 35 GB	~ 0.01 hours	~ 0.6 GB	~ 0.2 minutes	~ 0.07 GB	~ 0.008 hours	~ 0.8 GB	8
small_2.fastq	50,121	2,506,050	8.010	~ 1 minute	~ 35 GB	~ 0.01 hours	~ 0.6 GB	~ 0.2 minutes	~ 0.07 GB	~ 0.008 hours	~ 0.8 GB	8
medium_1.fastq	786,742	59,792,392	152	~ 3 minutes	~ 68 GB	~ 0.1 hours	~ 3 GB	~ 0.8 minutes	~ 0.6 GB	~ 0.3 hours	~ 5 GB	8
medium_2.fastq	786,742	59,792,392	152	~ 3 minutes	~ 68 GB	~ 0.1 hours	~ 3 GB	~ 0.8 minutes	~ 0.6 GB	~ 0.3 hours	~ 5 GB	8
large_1.fastq	10,174,715	1,027,646,215	3,376	~ 58 minutes	~ 80 GB	~ 5 hours	~ 30 GB	~ 35 minutes	~ 8 GB	~ 13 hours	~ 30 GB	8
large_2.fastq	10,174,715	1,027,646,215	3,376	~ 58 minutes	~ 80 GB	~ 5 hours	~ 30 GB	~ 35 minutes	~ 8 GB	~ 13 hours	~ 30 GB	8

Tip

The Inchworm (step 1) and Chrysalis (step 2) steps can be memory intensive. A basic recommendation is to have 1GB of RAM per 1M ~76 base Illumina paired-end reads.

Trinity

Running Trinity in Multiple Steps¶

Useful Information¶

Running Trinity in Multiple Steps ¶