Running Trinity in Multiple Steps

Running Trinity with Paired-End fastq data with 8 CPUs and 100GB of RAM

Trinity produces many intermediate files that can affect the file system. To avoid any issues, please copy all the input data to the faster local storage called “scratch”, store the output in “scratch” and finally copy all the needed output files from “scratch” to /work. The “scratch” directories are unique per job and are deleted when the job finishes. This can greatly improve performance!

The first step of running Trinity is to run Trinity with the option –no_run_inchworm:

trinity_step1.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step1.%J.out
#SBATCH --error=Trinity_Step1.%J.err

module load trinity

# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch 

Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_inchworm

# copy output in current directory
cp -r /scratch/trinity_out/ .

The second step of running Trinity is to run Trinity with the option –no_run_chrysalis:

trinity_step2.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step2.%J.out
#SBATCH --error=Trinity_Step2.%J.err

module load trinity

# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/

Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis

# copy output in current directory
cp -r /scratch/trinity_out/ .

The third step of running Trinity is to run Trinity with the option –no_distributed_trinity_exec:

trinity_step3.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step3.%J.out
#SBATCH --error=Trinity_Step3.%J.err

module load trinity

# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/

Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_distributed_trinity_exec

# copy output in current directory
cp -r /scratch/trinity_out/ .

The fourth step of running Trinity is to run Trinity without any additional option:

trinity_step4.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step4.%J.out
#SBATCH --error=Trinity_Step4.%J.err

module load trinity

# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/

Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE

# copy output in current directory
cp -r /scratch/trinity_out/ .

Trinity Output

Trinity outputs number of files in its trinity_out/ output directory after each executed step. The output file Trinity.fasta is the final Trinity output that contains the assembled transcripts.

The Inchworm (step 2) and Chrysalis (step 3) steps can be memory intensive. A basic recommendation is to have 1GB of RAM per 1M ~76 base Illumina paired-end reads.