Messages & Announcements

  • 2013-05-30:  Tusker Downtime update
    Category:  Maintenance

    The Tusker /work filesystem migration is complete. However, we have noted a small number of potentially damaged files. We are working to identify the cause, and to avoid complicating the efforts, Tusker will be unavailable at least until noon Thursday.

    Lustre RAID sets have been reconfigured from RAID5 to RAID6. This required a process of shuffling data away from RAID5 sets to newly created RAID6 sets and adding the new sets to Lustre. This process was long and finally completed on Wednesday morning.

    Final file system checks have revealed further issues that require further vetting before the file system can be brought online for job output. The check revealed damaged files. The cause of the damaged files is still unknown and a solution to this issue must be found before bringing the file system online.


  • 2013-05-23:  OSG Computing Workshop June 5
    Category:  General Announcement

    The Holland Computing Center would like to invite you or your interested students to attend a small, informal, and free workshop at the Schorr center. We'll be covering topics including job submission to local schedulers and the Open Science Grid.

    This style of computing is useful for a large batch of independent jobs. It is used in bioinformatics, particle physics, proteomics, film rendering, and other computational fields. Bring your laptop -- we will do our best to have you submitting jobs to the Open Science Grid before lunch.

    The day will commence at 9 in the morning and finish up in the afternoon. Seating is limited and we will be serving lunch to those that request it. We do need those interested to respond prior to May 24th (this coming Friday) in order to make arrangements. Sign up now! We still have a few seats available.

    Those that are planning on attending should send an email to Carl Lundstedt, clundstedt@unl.edu with the subject line "Pivot 2013 Workshop".

    Regards,
    Holland Computing Center


  • 2013-05-21:  Tusker Downtime update
    Category:  Maintenance

    The Lustre upgrade and transition is ongoing. Details may be found here: http://hcc.unl.edu/hcccreditedit/messages.php?idmessages=111

    Further updates will be only at the URL above, with a final notice sent to this list when Tusker is again available for general use.


  • 2013-05-17:  REMINDER: TUSKER downtime Monday, May 20, 2013
    Category:  Maintenance

    Tusker will be down Monday, May 20 for file system maintenance. The scheduler will be changed to SLURM as well. This was announced previously (http://hcc.unl.edu/hcccreditedit/messages.php?idmessages=111); this is a reminder only.


    Jobs before Monday will not be started unless the requested run time would allow them to finish before the shutdown. We anticipate an extended downtime of at least one full day. Other HCC machines are not affected by this downtime.

  • 2013-05-08:  Tusker downtime May 20 for maintenance
    Category:  Maintenance

    Tusker will be unavailable on May 20 for filesystem maintenance. This will be an extended downtime; the Lustre filesystem will be expanded and reconfigured. Data will not be deleted, but significant changes will be implemented.

    ***Users are asked to remove any unneeded data from the /work filesystem before that time; the less data, the shorter the required downtime. ****

    The scheduler will be changed from Maui/Torque to SLURM during this time as well. SLURM has been in place on Sandhills for some time with good results; an overview may be found here: https://hcc-docs.unl.edu/display/HCCDOC/Submitting+Jobs. An open house workshop will be held the week of May 20 to aid users in modifying existing scripts for the SLURM scheduler.

    As always, please contact hcc-support@unl.edu if you have questions or concerns.

    Best regards,
    David Swanson


    Maintenance progress:
    * Backup of the Lustre metadata (MDT) is complete
    * Upgrade of storage servers to SL6.4 and Lustre 2.1.5 is complete
    * Migration of existing data onto new storage hardware is in progress
    Next steps:
    * Reconfiguration of RAID-5 arrays into RAID-6

    Lustre: Currently Lustre is provided by Terascala, a commercial partner recommended (and previously included in a bid) by Dell. HCC has experienced multiple intermittent issues on Tusker due to the current Lustre implementation. Since this has not been addressed sufficiently, we will no longer utilize the commercial support and the constraints Terascala demands for continuing this relationship. Lustre is open source; we have implemented the Lustre filesystem independently on Sandhills for several years. Thus, it has been decided to do likewise on Tusker.

    SLURM: we have also had intermittent issues with parallel jobs run under Maui/Torque. SLURM has run for over a year on Sandhills with considerable success, so we are going to use it on Tusker as well. Most of the syntax is very similar, and some Maui/Torque scripts will work unaltered (they are wrapped and interpreted by SLURM). Most users find SLURM highly addictive since it is much more responsive. We will _not_ be changing the scheduler on Firefly, since it will be retired no later than this fall.