Messages & Announcements

2018-08-04: Crane: /work filesystem restored
Category: General Announcement
The /work filesystem for Crane is back in service. A filesystem check was completed with no errors found. Jobs which were running during the outage may have exceeded their time limit or had errors accessing data on /work. There was no data loss from this outage.

One of the storage servers experienced a disk drive failure, leading to a RAID controller reset. This caused the filesystems to report corruption and switch to read-only mode. While recovering the system, the initial consistency checks showed significant numbers of errors. However, these were likely spurious errors related to the journal. After the journal was replayed, the repair process went smoothly.
2018-08-04: Crane: /work filesystem unplanned downtime
Category: System Failure
The /work filesystem for Crane is partially unavailable. One of the storage servers experienced hardware issues leading to corruption on the Lustre /work filesystem. Filesystem consistency checks are curently running, and the output unfortunately suggests that data loss or corruption is likely.

Pending jobs will be held until the maintenance is complete.
2018-07-30: Anvil filesystem maintenance completed
Category: Maintenance
The maintenance on the Ceph filesystem has been completed and the system is available for use. Please check your VMs and reboot if they appear unresponsive. As always, contact hcc-suppport@unl.edu if you require assistance.
2018-07-12: HCC Anvil filesystem maintenance planned --- 23rd July
Category: Maintenance
This announcement is for Ceph filesystem maintenance affecting Anvil only. The maintenance window starts at 9:00 AM 23th July and may take up to a week. Anvil will keep running during the downtime, however some performance impact is expected. Users are encouraged to shut down their VMs if possible. Running VMs may be suspended or rendered unresponsive during this timeframe. A follow-up announcement will be posted when the system is ready for production use.
2018-06-29: SANDHILLS: Unexpected outage
Category: System Failure
An unexpected power outage occurred in SCHORR approx 9:40pm Fri Jun 29. Cluster infrastructure maintained service but worker nodes rebooted. Please check the status of your running and queued jobs on SANDHILLS.

An unexpected power outage occurred in SCHORR approx 9:40pm Fri Jun 29. Cluster infrastructure maintained service but worker nodes rebooted. Please check the status of your running and queued jobs on SANDHILLS.