Downloading SRA data from NCBI

One way to download high-volume data from NCBI is to use command line utilities, such as wget, ftp or Aspera Connect ascp plugin. The Aspera Connect plugin is commonly used high-performance transfer plugin that provides the best transfer speed.

This plugin is available on our clusters as a module. In order to use it, load the appropriate module first:

$ module load aspera-cli

The basic usage of the Aspera plugin is

$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l <max_download_rate_in_Mbps>m anonftp@ftp.ncbi.nlm.nih.gov:/<files_to_transfer> <local_work_output_directory>
where -k 1 enables resume of partial transfers, -T disables encryption for maximum throughput, and -l sets the transfer rate.

<files_to_transfer> mentioned in the basic usage of Aspera plugin has a specifically defined pattern that needs to be followed:

<files_to_transfer> = /sra/sra-instant/reads/ByRun/sra/SRR|ERR|DRR/<first_6_characters_of_accession>/<accession>/<accession>.sra
where SRR|ERR|DRR should be either SRR, ERR or DRR and should match the prefix of the target .sra file.

More ascp options can be seen by using:

$ ascp --help

For example, if you want to download the SRR304976 file from NCBI in your $WORK data/ directory with downloading speed of 1000 Mbps, you should use the following command:

$ ascp -i $ASPERA_PUBLIC_KEY -k 1 -T -l 1000m anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra /work/[groupname]/[username]/data/