FAQ

I have an account, now what?
How do I change my password?
I forgot my password, how can I retrieve it?
I just deleted some files and didn't mean to! Can I get them back?
How do I (re)activate Duo?
How many nodes/memory/time should I request?
I am trying to run a job but nothing happens?
I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?
I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?
I keep getting the error "Job cancelled due to time limit." What does this mean and how do I fix it?
I want to talk to a human about my problem. Can I do that?
My submitted job takes long time waiting in the queue or it is not running?
What IP's do I use to allow connections to/from HCC resources?
Why my job is showing (ReqNodeNotAvail, Reserved for maintenance) before a downtime?
Why my Open OnDemand JupyterLab or Interactive App Session is stuck and is not starting?
My directories are not full, but my Open OnDemand JupyterLab Session is still not starting. What else should I try?
Why my Open OnDemand RStudio Session is crashing?
Why I can not access files under shared Attic/Swan Globus Collection?
I am graduating soon, what will happen with my HCC account?
I need more resources than I can select with Open OnDemand Apps, can I do that?
How can I check which directories utilize the most storage on Swan?
I want to create HCC account, but when I try to request one, I am getting the error "Your account email must match the email on record for this group". What should I do?
I want to compress large directory with many files. How can I do that?
Can HCC provide training for my group?
Can HCC provide help and resources for my workshop?
Where can I get training on using HCC resources?
My job is submitted to the highmem partition and is pending with QOSMinMemory reason. What does this mean?
I used Globus to copy my data across HCC file systems. How can I check that all data was successfully transferred and the data checksums match?

I have an account, now what?¶

Congrats on getting an HCC account! Now you need to connect to a Holland cluster. To do this, we use an SSH connection. SSH stands for Secure Shell, and it allows you to securely connect to a remote computer and operate it just like you would a personal machine.

Depending on your operating system, you may need to install software to make this connection. Check out our documentation on Connecting to HCC Clusters.

How do I change my password?¶

I forgot my password, how can I retrieve it?¶

Information on how to change or retrieve your password can be found on the documentation page: How to change your password

All passwords must be at least 8 characters in length and must contain at least one capital letter and one numeric digit. Passwords also cannot contain any dictionary words. If you need help picking a good password, consider using a (secure!) password generator such as this one provided by Random.org

To preserve the security of your account, we recommend changing the default password you were given as soon as possible.

I just deleted some files and didn't mean to! Can I get them back?¶

That depends. Where were the files you deleted?

If the files were in your $HOME directory (/home/group/user/): It's possible.

$HOME directories are backed up daily and we can restore your files as they were at the time of our last backup. Please note that any changes made to the files between when the backup was made and when you deleted them will not be preserved. To have these files restored, please contact HCC Support at hcc-support@unl.edu as soon as possible.

If the files were in your $WORK directory (/work/group/user/): No.

Unfortunately, the $WORK directories are created as a short term place to hold job files. This storage was designed to be quickly and easily accessed by our worker nodes and as such is not conducive to backups. Any irreplaceable files should be backed up in a secondary location, such as Attic, the cloud, or on your personal machine. For more information on how to prevent file loss, check out Preventing File Loss.

How do I (re)activate Duo?¶

If you have not activated Duo before:

Please ~~stop by our offices~~ join our Remote Open Office hours or schedule another remote session at hcc-support@unl.edu and show your photo ID and we will be happy to activate it for you.

If you have activated Duo previously but now have a different phone number:

~~Stop by our offices along with a photo ID and we can help you reactivate Duo and update your account with your new phone number.~~

Join our Remote Open Office hours or schedule another remote session at hcc-support@unl.edu and show your photo ID and we will be happy to activate it for you.

If you have activated Duo previously and have the same phone number:

Email us at hcc-support@unl.edu from the email address your account is registered under and we will send you a new link that you can use to activate Duo.

How many nodes/memory/time should I request?¶

Short answer: We don’t know.

Long answer: The amount of resources required is highly dependent on the application you are using, the input file sizes and the parameters you select. Sometimes it can help to speak with someone else who has used the software before to see if they can give you an idea of what has worked for them.

Ultimately, it comes down to trial and error; try different combinations and see what works and what doesn’t. Good practice is to check the output and utilization of each job you run. This will help you determine what parameters you will need in the future.

For more information on how to determine how many resources a completed job used, check out the documentation on Monitoring Jobs.

I am trying to run a job but nothing happens?¶

Where are you trying to run the job from? You can check this by typing the command `pwd` into the terminal.

If you are running from inside your $HOME directory (/home/group/user/):

Move your files to your $WORK directory (/work/group/user) and resubmit your job. The $HOME folder is not meant for job output. You may be attempting to write too much data from the job.

If you are running from inside your $WORK directory:

Contact us at hcc-support@unl.edu with your login, the name of the cluster you are running on, and the full path to your submit script and we will be happy to help solve the issue.

I keep getting the error "slurmstepd: error: Exceeded step memory limit at some point." What does this mean and how do I fix it?¶

This error occurs when the job you are running uses more memory than was requested in your submit script.

If you specified --mem or --mem-per-cpu in your submit script, try increasing this value and resubmitting your job.

If you did not specify --mem or --mem-per-cpu in your submit script, chances are the default amount allotted is not sufficient. Add the line

#SBATCH --mem=<memory_amount>

to your script with a reasonable amount of memory and try running it again. If you keep getting this error, continue to increase the requested memory amount and resubmit the job until it finishes successfully.

For additional details on how to monitor usage on jobs, check out the documentation on Monitoring Jobs.

If you continue to run into issues, please contact us at hcc-support@unl.edu for additional assistance.

I keep getting the error "Some of your processes may have been killed by the cgroup out-of-memory handler." What does this mean and how do I fix it?¶

This is another error that occurs when the job you are running uses more memory than was requested in your submit script.

If you specified --mem or --mem-per-cpu in your submit script, try increasing this value and resubmitting your job.

If you did not specify --mem or --mem-per-cpu in your submit script, chances are the default amount allotted is not sufficient. Add the line

#SBATCH --mem=<memory_amount>

to your script with a reasonable amount of memory and try running it again. If you keep getting this error, continue to increase the requested memory amount and resubmit the job until it finishes successfully.

For additional details on how to monitor usage on jobs, check out the documentation on Monitoring Jobs.

If you continue to run into issues, please contact us at hcc-support@unl.edu for additional assistance.

I keep getting the error "Job cancelled due to time limit." What does this mean and how do I fix it?¶

This error occurs when the job you are running reached the time limit than was requested in your submit script without finishing successfully.

If you specified --time in your submit script, try increasing this value and resubmitting your job.

If you did not specify --time in your submit script, chances are the default runtime of 1 hour is not sufficient. Add the line

#SBATCH --time=<runtime>

to your script with increased runtime value and try running it again. The maximum runtime on Swan is 7 days (168 hours).

For additional details on how to monitor usage on jobs, check out the documentation on Monitoring Jobs.

If you continue to run into issues, please contact us at hcc-support@unl.edu for additional assistance.

I want to talk to a human about my problem. Can I do that?¶

Of course! We have an open door policy and invite you to ~~stop by either of our offices anytime Monday through Friday between 9 am and 5 pm. One of the HCC staff would be happy to help you with whatever problem or question you have.~~ join our Remote Open Office hours, schedule a remote session at hcc-support@unl.edu, or you can drop one of us a line and we'll arrange a time to meet: Contact Us.

My submitted job takes long time waiting in the queue or it is not running?¶

If your submitted jobs are taking long time waiting in the queue, that usually means your account is over-utilizing and your fairshare score is low, this might be due submitting big number of jobs over the past period of time; and/or the amount of resources (memory, time) you requested for your job is big. For additional details on how to monitor usage on jobs, check out the documentation on Monitoring queued Jobs.

What IP's do I use to allow connections to/from HCC resources?¶

Under normal circumstances no special network permissions are needed to access HCC resources. Occasionally, it may be necessary to whitelist the public IP addresses HCC utilizes. Most often this is needed to allow incoming connections for an external-to-HCC license server, but may also be required if your local network blocks outgoing connections. To allow HCC IP's, add the following ranges to the whitelist:

129.93.175.0/26
129.93.227.64/26
129.93.241.16/28

If you are unsure on how to do this, contact your local IT support staff for assistance. For additional questions or issues with this, please Contact Us.

Why my job is showing (ReqNodeNotAvail, Reserved for maintenance) before a downtime?¶

Jobs submitted before a downtime may pend and show (ReqNodeNotAvail, Reserved for maintenance) for their status. (Information on upcoming downtimes can be found at status.hcc.unl.edu.) Any job which cannot finish before a downtime is scheduled to begin will pend and show this message. For example, the downtime starts in 6 days but the script is requesting (via the --time option) 7 days of runtime. If you are sure your job can finish in time, you can lower the requested time to be less than the interval before the downtime begins (for example, 4 days if the downtime starts in 6 days). Use this with care however to ensure your job isn't prematurely terminated. Alternatively, you can simply wait until the downtime is completed. Jobs will automatically resume normally afterwards; no special action is required.

Why my Open OnDemand JupyterLab or Interactive App Session is stuck and is not starting?¶

The most common reason for this is full $HOME directory. You can check the size of the directories in $HOME by running ncdu on the terminal from the $HOME directory.

Then, please remove any unnecessary data; move data to $COMMON or $WORK; or back up important data elsewhere. If the majority of storage space in $HOME is utilized by conda environments, please move the conda environments or remove unused conda packages and caches.

My directories are not full, but my Open OnDemand JupyterLab Session is still not starting. What else should I try?¶

If the inode and storage quotas of the $HOME and $WORK directories on Swan are not exceeded, and your Open OnDemand JupyterLab Session is still not starting, there are two additional things you can check:

If a custom conda environment is not used, when installing packages with pip locally, these libraries are installed in $HOME/.local. When using other modules or applications that are based on Python/conda (e.g., Open OnDemand JupyterLab), these local installs can cause conflicts and errors. In this case, please rename the $HOME/.local directory (e.g., mv $HOME/.local $HOME/.local_old).
Please make sure that you don't set variables such as PYTHONPATH in your $HOME/.bashrc file. If you have this variable set in the $HOME/.bashrc file, please comment out that line and run source $HOME/.bashrc to apply the changes.

Whether you have renamed the $HOME/.local directory and/or modified the file $HOME/.bashrc, please cancel and restart your JupyterLab Session.

Why my Open OnDemand RStudio Session is crashing?¶

There are two main reasons why this may be happening: 1) The requested RAM is not enough for the analyses you are performing. In this case, please terminate your running Session and start a new one requesting more RAM. 2) Some R packages installed as part of the OOD RStudio App may be incompatible with each other. In this case, please terminate your running Session and rename the directory where these packages are installed (e.g., mv $HOME/R $HOME/R.bak). To reduce the number of R packages you need to install, please use specific variants such as Bioconductor, Tidyverse or Geospatial when needed instead of installing Bioconductor packages using the OOD RStudio Basic variant for example.

Why I can not access files under shared Attic/Swan Globus Collection?¶

In some occasions, errors such as "Mapping collection to specified ID failed.", may occur when accessing files from shared Attic/Swan Globus Collection. In order to resolve this issue, the owner of the collection needs to login to Globus and activate the hcc#attic or hcc#swan endpoint respectively. This should reactivate the correct permissions for the collection.

I am graduating soon, what will happen with my HCC account?¶

Access to HCC resources is separate from access to NU resources, so you do not lose access to HCC when you graduate.

If the HCC account is part of a research group, the account will remain active until the owner of the group requests that the account needs to be deactivated or until the account hasn't been used for a minimum of an year, whichever comes first.
If the account holder continues collaborating with the HCC group owner as an outside collaborator, a proof of collaboration may be required. For more information on the User regulations please see here.
If the account is only part of a course group, then according to our class policy, the account will be deactivated one week after the course end date.

I need more resources than I can select with Open OnDemand Apps, can I do that?¶

The Open OnDemand Apps are meant to be used for learning, development and light testing and have limited resources compared to the resources available for batch submissions. If the resources provided by OOD Apps are not enough, then they should migrate their workflow to batch script.

How can I check which directories utilize the most storage on Swan?¶

You can run ncdu from the Swan terminal with the location in question and (re)move directories and data if needed, e.g.,:

ncdu $HOME/my-folder

Note

If you have thousands or millions files in a location on Swan, please run ncdu only on a sub-directory you suspect may contain large numbers of files.

You may also use ncdu on locations in $WORK or $COMMON. Note that running ncdu puts additional load on the filesystem(s), so please run it sparingly. HCC suggests running ncdu once and saving the output to a file; ncdu will read from this file instead of potentially scanning the filesystem multiple times. To run ncdu in this manner, first scan the location using the -o option

ncdu -o ncdu_output.txt $HOME/my-folder

Then use the -f option to start ncdu graphically using this file, i.e.

ncdu -f ncdu_output.txt

Note that re-reading the filesystem to see changes in real time is not supported in this mode. After making changes (deleting/moving files), a new output file will need to be created and read by repeating the steps above.

I want to create HCC account, but when I try to request one, I am getting the error "Your account email must match the email on record for this group". What should I do?¶

This error message indicates that you have probably selected the "I am the owner of this group and this account is for me." checkbox when filling out the New User Request Form. This checkbox should be selected only by the owner of the HCC group. If you are not the owner of the HCC group, please do not select this checkbox and try re-submitting the form again.

I want to compress large directory with many files. How can I do that?¶

In general, we recommend using zip as the archive format as zip files keep an index of the files. Moreover, zip files can be quickly indexed by the various zip tools, and allow extraction of all files or a subset of files. To compress the directory named input_folder into output.zip, you can use:

zip -r output.zip input_folder/

If you don't need to list and extract subsets of the archived data, we recommend using tar instead. To compress the directory named input_folder into output.tar.gz, you can use:

tar zcf output.tar.gz input_folder/

Depending on the size of the directory and number of files you want to compress, you can perform the compressing via an Interactive Job or SLURM job.

Can HCC provide training for my group?¶

HCC can provide an introductory training (up to 2 hours) for groups (more than 2 people) on request via Zoom and In-Person.

Before submitting a request for training, please ensure everyone who will be attending has an active HCC account and has activated DUO for their HCC account.

Training requests can be submitted to hcc-support@unl.edu. Please include:

A list of available times
How many will be attending
Preference on Zoom or on-site training
If on-site, what location is it at?

An HCC staff member will reach out to confirm the date and location.

HCC also provides virtual Open Office Hours every Tuesday and Thursday from 2-3 PM. More details are available on the Office Hours webpage.

Can HCC provide help and resources for my workshop?¶

We are happy to help with your workshop! We are able to provide up to 40 demo accounts for participants who don't already have HCC accounts and can have a staff member on-site to provide assistance with issues related to HCC and answering HCC related questions.

Please submit your request atleast 1 month in advance. Requests may not be able to be fulfilled based on staff availability. It is strongly recommended to involve HCC staff during the initial planning of the hands-on portion of the workshop in order to provide smooth and timely experience with HCC resources.

Before submitting a request for workshop support, please fully test your materials using Swan and provide us a complete list of any software packages and environments that are needed.

Workshop support requests can be submitted to hcc-support@unl.edu. Please include:

The date(s) and time(s) HCC will be utilized during the workshop, including any neccessarry setup.
How many will be attending. If possible, please provide how many already have HCC accounts.
Location of the workshop
A list of software packages or environments that everyone would need.
If you are using Open OnDemand JupterLab or RStudio, we can create custom kernel/image for the purpose of the workshop.
If you are using a conda environment, we can create a shared environment for participants to use without the need to have participants creating their own.

An HCC staff member will reach out to confirm the date and location.

Where can I get training on using HCC resources?¶

HCC provides free and low cost training events throughout the year. Most events are held in-person, but some will be hybrid or Zoom.

New events are posted on our upcoming events page and announced through our hcc-announce mailing list.

Past events and their materials are also available on our past events page.

My job is submitted to the highmem partition and is pending with QOSMinMemory reason. What does this mean?¶

The majority of nodes in the batch partition on Swan have 256GBs of RAM, with a few nodes with up to 2TBs of RAM. To ensure that the jobs that require lots of memory will run on the nodes with more RAM memory, SLURM uses the highmem partition, which is part of the batch partition. This is not an actual partition, so it can not be separately used. SLURM internally submits the job to both highmem and batch partitions, and depending on the requested RAM memory, allocates the requested resources. During this process, when checking the job status, you may see:

$ squeue -u demo
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1000000 highmem,b job_name demo PD       0:00      1 (QOSMinMemory)

This message means that the job does not require high memory and it will be submitted to the batch partition when the requested resources are available. Once this internal process is completed, the NODELIST(REASON) message will be updated accordingly.

Please note that highmem,b is truncated from highmem,batch. The expanded output can be seen with:

$ squeue -u demo -o "%.18i %.20P %.8j %.8u %.2t %.10M %.6D %R"
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           1000000 highmem,batch job_name demo PD       0:00      1 (QOSMinMemory)

Note

The number of nodes with high memory is limited, so please only request high amounts of memory if the job really needs it. Otherwise, you may encounter longer waiting times, lower submission priority and underutilized resources.

I used Globus to copy my data across HCC file systems. How can I check that all data was successfully transferred and the data checksums match?¶

Globus automatic file integrity verification using checksums is turned off for HCC-specific Globus collections/endpoints. By performing automatic file integrity verification using checksums via Globus all the source and destination files are read again and compared, and this adds significant I/O load on the HCC file-systems. All HCC-specific file-systems have already built-in integrity checksums, thus additional verification is not needed. If the status of your Globus transfer is SUCCEEDED, then the checksums of the source and destination files should match.

If you would still like to compare the checksums of the transferred files regardless of the Globus transfer status, there are two separate ways how you can achieve this:

Start another Globus transfer and select both "sync - only transfer new or changed files" and "checksum is different" under "Transfer & Timer Options" under the File Manager Globus tab.
Run rsync with the --checksum option using an Interactive Session or SLURM job.