Messages & Announcements

  • 2018-11-28:  Crane: GPU driver update Wed., Nov 28 @ 9am
    Category:  Maintenance

    The Crane GPU nodes will be unavailable on Wednesday, Nov. 28th starting at 9am to apply GPU driver updates. This update is necessary to support newer versions of the CUDA Toolkit. Jobs will be held prior to the maintenance and will resume normally afterwards. During this time, running a Jupyter Notebook with GPU support will not be possible. We expect the maintenance to be concluded no later than 5pm the same day.

    Please contact hcc-support@unl.edu with any questions or issues regarding this maintenance.


  • 2018-10-23:  Upcoming removal of $WORK on Tusker
    Category:  General Announcement

    During the upcoming relocation of the Tusker resource scheduled for early 2019, ALL DATA ON THE $WORK FILE SYSTEM WILL BE LOST. In preparation for the migration, USERS WILL NEED TO MOVE ANY IRREPLACABLE DATA FROM $WORK AS SOON AS POSSIBLE AND NO LATER THAN DECEMBER 15TH. After successful completion of the move, data can then be reloaded onto Tusker. In the interim, possible storage solutions would be $COMMON and Attic. $COMMON will allow the convenience of keeping the data accessible from Crane. For additional data security, our extended resource Attic provides a backed up option at a minimal cost. For more information on the relocation, please see the announcement posted in the September release of the Holland newsletter: https://newsroom.unl.edu/announce/holland/8444/48362 If you have any questions or concerns regarding this, please contact us at hcc-support@unl.edu so we can assist you.


  • 2018-10-19:  Anvil, Crane and Tusker services restored
    Category:  General Announcement

    HCC's datacenter at PKI in Omaha suffered an unexpected power outage the morning of Friday, Oct 19th during a preventative maintenance window.

    This type of maintenance has occurred without issue many times in the past and requires the datacenter UPS (battery backup) be bypassed meaning all equipment relies directly on city power. While the bypass was in place there was an issue with the city power feed which caused many servers to reboot unexpectedly and various pieces of networking to fail.

    HCC staff has worked throughout the day to restore services and believes we have done so at this time. All services hosted at PKI were affected including:

    - ANVIL: Many VMs hosts were rebooted including the instances running on those hosts. Please check your instances and contact hcc-support@unl.edu with your instance ID if you have any problems.

    - CRANE / TUSKER : Running jobs were killed and users should check their /home and /work files that may have been open or in the process of being written. Files being written during the power outage are likely lost or corrupted.

    - COMMON Filesystem : Users should check their files exist and are accessible. Files being written during the power outage are likely lost or corrupted.

    This is the first major power issue at this datacenter in a very long time and we will investigate and take any possible actions to prevent it from happening again. At this time it appears to have simply been a very unfortunate coincidence of being off battery power while the main power feed had an unexpected failure.

    Please contact hcc-support@unl.edu with any questions or issues resulting from this outage.


    HCC's datacenter at PKI in Omaha suffered an unexpected power outage the morning of Friday, Oct 19th during a preventative maintenance window.

    This type of maintenance has occurred without issue many times in the past and requires the datacenter UPS (battery backup) be bypassed meaning all equipment relies directly on city power. While the bypass was in place there was an issue with the city power feed which caused many servers to reboot unexpectedly and various pieces of networking to fail.

    HCC staff has worked throughout the day to restore services and believes we have done so at this time. All services hosted at PKI were affected including:

    - ANVIL: Many VMs hosts were rebooted including the instances running on those hosts. Please check your instances and contact hcc-support@unl.edu with your instance ID if you have any problems.

    - CRANE / TUSKER : Running jobs were killed and users should check their /home and /work files that may have been open or in the process of being written. Files being written during the power outage are likely lost or corrupted.

    - COMMON Filesystem : Users should check their files exist and are accessible. Files being written during the power outage are likely lost or corrupted.

    This is the first major power issue at this datacenter in a very long time and we will investigate and take any possible actions to prevent it from happening again. At this time it appears to have simply been a very unfortunate coincidence of being off battery power while the main power feed had an unexpected failure.

    Please contact hcc-support@unl.edu with any questions or issues resulting from this outage.

  • 2018-10-19:  Anvil, Crane and Tusker impacted by power outage at PKI data center
    Category:  General Announcement

    HCC staff are investigating nodes impacted now and are working to bring the systems back online. A follow-up announcement will be sent once the systems are brought to a production state.


    HCC staff are investigating nodes impacted now and are working to bring the systems back online. A follow-up announcement will be sent once the systems are brought to a production state.

  • 2018-10-15:  Rolling updates on HCC cluster resources
    Category:  General Announcement

    Security updates are being applied to the Crane, Sandhills and Tusker clusters. The rolling updates are applied as soon as the oldest job on a worker node finishes, potentially keeping new jobs from starting until the updates are applied. This will cause pending jobs to report lengthened wait times in queue with a reason of "Priority," "Resources," or "ReqNodeNotAvail, UnavailableNodes," particularly if the requested time by the job exceeds the time before the worker nodes' resources drain. Shorter jobs will likely have more resources to run on in the next couple of days, as long as they finish before the oldest job completes on draining worker nodes.

    All updates will be finished by 10PM Friday, October 19th.


    Security updates are being applied to the Crane, Sandhills and Tusker clusters. The rolling updates are applied as soon as the oldest job on a worker node finishes, potentially keeping new jobs from starting until the updates are applied. This will cause pending jobs to report lengthened wait times in queue with a reason of "Priority," "Resources," or "ReqNodeNotAvail, UnavailableNodes," particularly if the requested time by the job exceeds the time before the worker nodes' resources drain. Shorter jobs will likely have more resources to run on in the next couple of days, as long as they finish before the oldest job completes on draining worker nodes.

    All updates will be finished by 10PM Friday, October 19th.

Pages