- 2017-06-05: Crane /work filesystem downtime resolved
Category: General AnnouncementThe /work filesystem for Crane is restored as of 2:55pm.
One of the storage servers crashed and rebooted. A filesystem check was completed with no errors found. Running jobs which were accessing /work stalled until the filesystem was restored. This may have caused jobs to exceed their time limit. There was no data loss from this outage.
We believe the storage server crash was triggered by I/O delays as the RAID controller was rebuilding a failed disk drive. The rebuild is still running and we are monitoring the system.
- 2017-06-05: Crane /work filesystem unplanned downtime
Category: System FailureThe /work filesystem for Crane is partially unavailable. One of the storage servers crashed and rebooted. We are now running a filesystem check before placing the server back online. Pending jobs will be held until the maintenance is complete.
show details...
The filesystem check has been completed with no errors found. The /work filesystem is back online. Running jobs may be affected, but there was no data loss from this outage.
We believe the storage server crash was triggered by I/O delays as the RAID controller was rebuilding a failed disk drive. The rebuild is still running and we are monitoring the system.
- 2017-05-19: Crane: Maintenance complete
Category: General AnnouncementMaintenance is complete on Crane. Check changes made to the cluster in the following details section. Please let us know of any troubles using the cluster by sending email to hcc-support@unl.edu
show details...
Changes made during this downtime:
All partitions on Crane now have a uniform maximum time limit of 7 days.
The opa partition has been removed. To limit job submissions to nodes that have Omni Path fabric, use the following Slurm directive while submitting jobs to the batch partition:
#SBATCH --constraint=opa
The opaguest partition has been renamed to guest. Priority access nodes having Infiniband and Omni Path fabrics have been added to this partition. Use
#SBATCH --constraint=ib
or
#SBATCH --constraint=opa
to limit which nodes your jobs will be considered on. Any job that runs within the guest partition can be preempted by jobs submitted by the owners of the respective hardware.
- 2017-05-17: Sandhills: Maintenance complete
Category: General AnnouncementMaintenance is complete on Sandhills. Please let us know of any troubles using the cluster by sending email to hcc-support@unl.edu
show details...
Maintenance is complete on Sandhills. Please let us know of any troubles using the cluster by sending email to hcc-support@unl.edu
- 2017-05-11: Tusker: Maintenance complete
Category: General AnnouncementMaintenance is complete on Tusker. Please let us know of any troubles using the cluster by sending email to hcc-support@unl.edu
show details...
Maintenance is complete on Tusker. Please let us know of any troubles using the cluster by sending email to hcc-support@unl.edu