Trinity produces many intermediate files that can affect the file system. To avoid any issues, please copy all the input data to the faster local storage called “scratch”, store the output in “scratch” and finally copy all the needed output files from “scratch” to /work. The “scratch” directories are unique per job and are deleted when the job finishes. This can greatly improve performance!
The first step of running Trinity is to run Trinity with the option –no_run_inchworm:
trinity_step1.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step1.%J.out
#SBATCH --error=Trinity_Step1.%J.err
module load trinity
# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_inchworm
# copy output in current directory
cp -r /scratch/trinity_out/ .
The second step of running Trinity is to run Trinity with the option –no_run_chrysalis:
trinity_step2.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step2.%J.out
#SBATCH --error=Trinity_Step2.%J.err
module load trinity
# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/
Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_run_chrysalis
# copy output in current directory
cp -r /scratch/trinity_out/ .
The third step of running Trinity is to run Trinity with the option –no_distributed_trinity_exec:
trinity_step3.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step3.%J.out
#SBATCH --error=Trinity_Step3.%J.err
module load trinity
# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/
Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE --no_distributed_trinity_exec
# copy output in current directory
cp -r /scratch/trinity_out/ .
The fourth step of running Trinity is to run Trinity without any additional option:
trinity_step4.submit
#!/bin/bash
#SBATCH --job-name=Trinity_Step4
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=100gb
#SBATCH --output=Trinity_Step4.%J.out
#SBATCH --error=Trinity_Step4.%J.err
module load trinity
# copy input data to /scratch
cp input_reads_pair_1.fastq /scratch
cp input_reads_pair_2.fastq /scratch
cp -r trinity_out /scratch/
Trinity --seqType fq --max_memory 100G --left /scratch/input_reads_pair_1.fastq --right /scratch/input_reads_pair_2.fastq --SS_lib_type FR --output /scratch/trinity_out/ --CPU $SLURM_NTASKS_PER_NODE
# copy output in current directory
cp -r /scratch/trinity_out/ .
Trinity outputs number of files in its trinity_out/
output directory after each executed step. The output file Trinity.fasta
is the final Trinity output that contains the assembled transcripts.
The Inchworm (step 2) and Chrysalis (step 3) steps can be memory intensive. A basic recommendation is to have 1GB of RAM per 1M ~76 base Illumina paired-end reads.