Handling Data

Sensitive and Protected Data

HCC currently has no storage that is suitable for HIPAA or other PID data sets. Users are not permitted to store such data on HCC machines.

$COMMON Retirement - July 1st, 2025

The Common filesystem will be retired from service on July 1st, 2025!!!

Any jobs on Swan trying to start with the $COMMON filesystem or --licenses=common will not be able to be submitted.

Any data under /common or $COMMON will be lost if it is not moved to another location before the retirement date of July 1st, 2025.

It is strongly recommended to begin moving data and updating workflows immediately to avoid lossing data.

More information on the retirement is available on the Common Retirement FAQ Page. If you have any questions about the retirement, please email hcc-support@unl.edu.

All HCC machines have three separate areas for every user to store data, each intended for a different purpose. The three areas are /mnt/nrdstor/, /work, and /home, each with different functions. /home is your home directory with a quota limit of 20 GiB and is backed up for best-effort disaster recovery purposes. /work is the high performance, I/O focused directory for running jobs. /work has a 100 TiB per group quota, is not backed-up and is subject to a purge policy of 6 Months of inactivity on a file. /mnt/nrdstor/ works similarly to /work and is mounted with read and write capabilities on Swan. More information on the three storage areas on HCC's clusters are available in the Data Storage page.

HCC also offers a separate, near-line archive with space available for lease called Attic. Attic provides reliable large data storage that is designed to be more reliable than /work, and larger than /home. More information on Attic and how to transfer data to and from Attic can be found on the Using Attic page.

You can also use your UNL OneDrive Account account to download and upload files from any of the HCC clusters.

For moving general data into or out of HCC Resources, users are recommended to use scp for command line transfers on Windows 10, Windows 11, MacOS, and Linux, or for graphical transfers, WinSCP for Windows, and CyberDuck for MacOS and Linux

For moving large amounts of data into or out of HCC resources, users are highly encouraged to consider using Globus Connect.

If you have space requirements outside what is currently provided or any questions regarding moving data around, please email hcc-support@unl.edu.

Using /scratch storage space to improve running jobs:¶

Using Scratch

Storing public software-specific and research datasets¶

Software-specific datasets¶

Many software packages available on Swan (e.g., AlphaFold, HUMAnN) require datasets. Where possible, HCC has pre-downloaded the datasets and configured the modules to use the datasets. This avoids any quota and purge policy issues.

If you are not sure if a dataset the software requires is already available on Swan, please check the module info, or email hcc-support@unl.edu before you attempt to download it yourself.

Research datasets¶

Many public datasets are commonly used for running jobs across various scientific fields. To avoid any per-user or per-group quota issues, HCC can host these datasets on a system-wide location on Swan excluded from the purge policy, such that the entire HCC community can benefit from using a shared copy.

HCC currently hosts a few public datasets on Swan that can be accessed via data modules:

biodata/1.0 - Static data resources for bioinformatics/computational biology
mldata/1.0 - Static data resources for machine-learning/AI (e.g., ImageNet, TCGA, CAMELYON, TCIA)
mridata/1.0 - Static data resources for MRI/NeuroImaging (e.g., Penn Memory Center 3T ASHS 1.0 Atlas)
geodata/1.0 - Static data resources for geo data (e.g., NLDAS-2)
chemdata/1.0 - Static data resources for computational chemistry (e.g., Tetramers, Zinc)

If you are not sure if a public dataset is already available on Swan, please check the info of the available data modules (e.g., module help mldata/1.0), or email hcc-support@unl.edu before you attempt to download the dataset yourself.

To request a version update of the system-wide available datasets, please email hcc-support@unl.edu.

Note

If you have a licensed dataset you want to share with your research group, please email hcc-support@unl.edu for assistance.