Preventing File Loss

Each research group is allocated 50TB of storage in /work on HCC clusters. With over 400 active groups, HCC does not have the resources to provide regular backups of /work without sacrificing the performance of the existing filesystem. No matter how careful a user might be, there is always the risk of file loss due to user error, natural disasters, or equipment failure.

However, there are a number of solutions available for backing up your data. By carefully considering the benefits and limitations of each, users can select the backup methods that work best for their particular needs. For truly robust file backups, we recommend combining multiple methods. For example, use Git regularly along with manual backups to an external hard-drive at regular intervals such as monthly or biannually.


1. Use your local machine:

If you have sufficient hard drive space, regularly backup your /work directories to your personal computer. To avoid filling up your personal hard-drives, consider using an external drive that can easily be placed in a fireproof safe or at an off-site location for an extra level of protection. To do this, you can either use Globus Connect or an SCP client, such as Cyberduck or WinSCP. For help setting up an SCP client, check out our Connecting Guides.

For those worried about personal hard drive crashes, UNL offers the backup service NSave. For a small monthly fee, users can install software that will automatically backup selected files from their personal machine.

Benefits:

Limitations:

  • The amount you can backup is limited by available hard-drive space.
  • Manual backups of many files can be time consuming.

2. Use Git to preserve files and revision history:

Git is a revision control service which can be run locally or can be paired with a repository hosting service, such as GitHub, to provide a remote backup of your files. Git works best with smaller files such as source code and manuscripts. Anyone with an InCommon login can utilize UNL’s GitLab Instance, for free.

Benefits:

  • Git is naturally collaboration-friendly, allowing multiple people to easily work on the same project and provides great built-in tools to control contributions and managing conflicting changes.
  • Create individual repositories for each project, allowing you to compartmentalize your work.
  • Using UNL’s GitLab instance allows you to create private or internal (accessible by anyone within your organization) repositories.

Limitations:

  • Git is not designed to handle large files. GitHub does not allow files larger than 100MB unless using their Git Large File Storage and tracking files over 1GB in size can be time consuming and lead to errors when using other repository hosts.

3. Use Attic:

HCC offers long-term, near-line data storage through Attic. HCC users with an existing account can apply for an Attic account for a small annual fee that is substantially less than other cloud services.

Benefits:

  • Attic files are backed up regularly at both HCC locations in Omaha and Lincoln to help provide disaster tolerance and a second security layer against file loss.
  • No limits on individual or total file sizes.
  • High speed data transfers between Attic and the clusters when using Globus Connect and HCC’s high-speed data servers.

Limitations:

  • Backups must be done manually which can be time consuming. Setting up automated scripts can help speed up this process.

4. Use a cloud-based service, such as Box:

Many of us are familiar with services such as Google Drive, Dropbox, Box and OneDrive. These cloud-based services provide a convenient portal for accessing your files from any computer. NU offers OneDrive and Box services to all students, staff and faculty. But did you know that you can link your Box account to HCC’s clusters to provide quick and easy access to files stored there?  Follow a few set-up steps and you can add files to and access files stored in your Box account directly from HCC clusters. Setup your submit scripts to automatically upload results as they are generated or use it interactively to store important workflow scripts and maintain a backup of your analysis results.

Benefits:

  • Box@UNL offers unlimited file storage while you are associated with UNL.
  • Integrating with HCC clusters provides a quick and easy way to automate backups of analysis results and workflow scripts.

Limitations:

  • Box has individual file size limitations, larger files will need to be backed up using an alternate method.

5. Copy important files to /home:

While /work files and directories are not backed up, files and directories in /home are backed up on a daily basis. Due to the limitations of the /home filesystem, we strongly recommend that only source code and compiled programs are backed up to /home. If you do use /home to backup datasets, please keep a working copy in your /work directories to prevent negatively impacting the functionality of the cluster.

Benefits:

  • No need to make manual backups. /home files are automatically backed up daily.
  • Files in /home are not subject to the 6 month purge policy that exists on /work.
  • Doesn’t require the use of third-party software or tools.

Limitations:

  • Home storage is limited to 20GB per user. Larger files sets will need to be backed up using an alternate method.
  • Home is low performance and not suitable for active job output.

If you would like more information or assistance in setting up any of these methods, contact us at hcc-support@unl.edu