Messages & Announcements

  • 2018-08-05:  Tusker /work temporary outage
    Category:  General Announcement

    The Tusker /work filesystem unexpectedly went offline at 12:25pm today. Maintenance was performed and the system restored less than an hour later. Accesses to /work may have hung, users should check to make sure jobs did not timeout. No data loss or other consequences from this event are expected.

  • 2018-08-04:  Crane: /work filesystem restored
    Category:  General Announcement

    The /work filesystem for Crane is back in service. A filesystem check was completed with no errors found. Jobs which were running during the outage may have exceeded their time limit or had errors accessing data on /work. There was no data loss from this outage.

    One of the storage servers experienced a disk drive failure, leading to a RAID controller reset. This caused the filesystems to report corruption and switch to read-only mode. While recovering the system, the initial consistency checks showed significant numbers of errors. However, these were likely spurious errors related to the journal. After the journal was replayed, the repair process went smoothly.

  • 2018-08-04:  Crane: /work filesystem unplanned downtime
    Category:  System Failure

    The /work filesystem for Crane is partially unavailable. One of the storage servers experienced hardware issues leading to corruption on the Lustre /work filesystem. Filesystem consistency checks are curently running, and the output unfortunately suggests that data loss or corruption is likely.

    Pending jobs will be held until the maintenance is complete.

  • 2018-07-30:  Anvil filesystem maintenance completed
    Category:  Maintenance

    The maintenance on the Ceph filesystem has been completed and the system is available for use. Please check your VMs and reboot if they appear unresponsive. As always, contact if you require assistance.

  • 2018-07-12:  HCC Anvil filesystem maintenance planned --- 23rd July
    Category:  Maintenance

    This announcement is for Ceph filesystem maintenance affecting Anvil only. The maintenance window starts at 9:00 AM 23th July and may take up to a week. Anvil will keep running during the downtime, however some performance impact is expected. Users are encouraged to shut down their VMs if possible. Running VMs may be suspended or rendered unresponsive during this timeframe. A follow-up announcement will be posted when the system is ready for production use.