Handling Data

Sensitive and Protected Data
HCC currently has no storage that is suitable for HIPAA or other PID data sets. Users are not permitted to store such data on HCC machines.

All HCC machines have three separate areas for every user to store data, each intended for a different purpose. The three areas are /common, /work, and /home, each with different functions. /home is your home directory with a quota limit of 20 GiB and is backed up for best-effort disaster recovery purposes. /work is the high performance, I/O focused directory for running jobs. /work has a 50 TiB per group quota, is not backed-up and is subject to a purge policy of 6 months of inactivity on a file. /common works similarly to /work and is mounted with read and write capabilities on all HCC clusters, meaning any files on /common can be accessed from all of HCC clusters unlike /home and /work which are cluster dependant. More information on the three storage areas on HCC’s clusters are available in the Data Storage page.

HCC also offers a separate, near-line archive with space available for lease called Attic. Attic provides reliable large data storage that is designed to be more reliable than /work, and larger than /home. More information on Attic and how to transfer data to and from Attic can be found on the Using Attic page.

You can also use your UNL OneDrive Account account to download and upload files from any of the HCC clusters.

For moving general data into or out of HCC Resources, users are recommended to use scp for command line transfers on Windows 10, Windows 11, MacOS, and Linux, or for graphical transfers, WinSCP for Windows, and CyberDuck for MacOS and Linux

For moving large amounts of data into or out of HCC resources, users are highly encouraged to consider using Globus Connect.

If you have space requirements outside what is currently provided or any questions regarding moving data around, please email hcc-support@unl.edu.

Using /scratch storage space to improve running jobs:

Using Scratch

Storing public software-specific and research datasets

Software-specific datasets

Many software packages available on Swan (e.g., AlphaFold, HUMAnN) require datasets. Where possible, HCC has pre-downloaded the datasets and configured the modules to use the datasets. This avoids any quota and purge policy issues.

If you are not sure if a dataset the software requires is already available on Swan, please check the module info, or email hcc-support@unl.edu before you attempt to download it yourself.

Research datasets

Many public datasets are commonly used for running jobs across various scientific fields. To avoid any per-user or per-group quota issues, HCC can host these datasets on a system-wide location on Swan excluded from the purge policy, such that the entire HCC community can benefit from using a shared copy.

HCC currently hosts a few public datasets on Swan that can be accessed via data modules:

  • biodata/1.0 - Static data resources for bioinformatics/computational biology
  • mldata/1.0 - Static data resources for machine-learning/AI (e.g., ImageNet, TCGA, CAMELYON, TCIA)
  • mridata/1.0 - Static data resources for MRI/NeuroImaging (e.g., Penn Memory Center 3T ASHS 1.0 Atlas)
  • geodata/1.0 - Static data resources for geo data (e.g., NLDAS-2)
  • chemdata/1.0 - Static data resources for computational chemistry (e.g., Tetramers, Zinc)

If you are not sure if a public dataset is already available on Swan, please check the info of the available data modules (e.g., module help mldata/1.0), or email hcc-support@unl.edu before you attempt to download the dataset yourself.

To request a version update of the system-wide available datasets, please email hcc-support@unl.edu.

If you have a licensed dataset you want to share with your research group, please email hcc-support@unl.edu for assistance.