Messages & Announcements

  • 2014-04-22:  Tusker Downtime: update
    Category:  System Failure

    This affects Tusker's /work filesystem only.

    The /work filesystem on Tusker has become unstable and there is recent evidence that attempting to access files will result in data loss. We had planned an extensive downtime in mid May to do significant upgrades and repairs on /work -- this must now be moved up to immediately.

    The repairs required are time-consuming due to the nearly 40 Million files stored on Tusker. We anticipate it will take days, not hours, but a precise estimate is not possible. Follow up information will be added to the reference linked below as it becomes available. We will not know all specific file's status before tomorrow.

    *** If you have an urgent deadline, please contact us at hcc-support@unl.edu ***

    We apologize for this situation; it was not planned. Tusker will be unavailable stating immediately and lasting until this repair is complete.


    /work is designed for performance, not permanence. Nonetheless, we acknowledge it is painful when this filesystem suffers even partial failure -- we attempt to minimize this, but its design is not for long term archival use.

    Because of the planned upgrade, a recent and largely complete backup of /work exists. Some files may not have been replicated, but overall data loss will be much lower due to this.

    ***backups of /work are not normally done - this was a fortunate circumstance that should not be expected in the future ... ever ***

    The backup was not made intending to allow access to users, so several steps must be taken to make this data available -- we have started this process.

    4/26/2014

    The Lustre filesystem has been upgraded in multiple ways. Files are still stored temporarily and being copied back to Tusker. We will be able to take Tusker back on line once the restore copy is finished -- 40M files take a while. An email will be sent out at that time.

    4/29/2014

    To date 27784169 inodes (files and directories) out of 44M and 88 TB (out of 320TB) of data have been restored to Tusker. Further work remains; there is reason to expect it will be quicker than the copies to date.

    4/30/2014 (late)

    To date 160TB and 32M files/inodes. Things are picking up as the majority of the smallest files are now copied. MPI has been recompiled for the new drivers and versions. Maintenance now consists of very few outstanding issues (none crippling) and time to restore another 200 TB of data. By tomorrow evening we should be able to give a better estimate of the date for reopening Tusker.

    5/1/2014 (4pm)

    To date 210TB and 34M files/inodes. We are aiming for 360 TB, 44M files/inodes. It seems likely we will be able open the machine back up on Monday.


  • 2014-04-21:  Unplanned downtime for Tusker
    Category:  General Announcement

    This outage affects Tusker only. The /work filesystem requires immediate maintenance and/or repair. An unplanned downtime will start shortly -- we apologize for any inconvenience this causes (further details at link).


    We have had reports of missing files and other odd errors with Tusker's
    /work file system. The servers that back /work are reporting errors and
    require us to bring the system off-line to begin a file system check.

    This may be a lengthy process as there are multiple storage target units
    that need to be verified. We don't have an estimate of how long this will
    take but will send updates as we progress through the process.

    The /work filesystem must be taken off-line for this necessary maintenance to proceed.
    Users may queue jobs but no new jobs will start until completion; users
    attempting to interact with the /work file system may be logged off.
    Jobs that were running will be canceled.

    We apologize for any inconvenience this causes our users;
    Crane and Sandhills are unaffected by this maintenance period.

  • 2014-04-18:  Position Opening, Holland Computing Center: System Administrator
    Category:  General Announcement

    HCC has an open search for a System Administrator. Please see the link below for the official job advertisement. We encourage you to notify any possible candidates. Review will begin April 30.

    Many thanks,
    David Swanson



    Official advertisement:

    SYSTEMS ADMINISTRATOR
    Holland Computing Center
    University of Nebraska-Lincoln

    The Holland Computing Center (HCC) is a large High Performance Computing facility supporting the NU system. Position will be required to maintain (HCC) core resources including, but not limited to, LDAP and group Content Management System (CMS), provide resolution to user issues, support and maintain user authentication mechanisms and work with (HCC) staff to maintain systems security integrity. Position will be expected to keep up on industry security announcements and coordinate with other HCC staff to appropriately respond to maintain system security and high availability.

    Various tasks including the following will be required: maintaining and upgrading hardware, operating systems, and related aspects (networking, storage, computing) of various systems; performing backups; managing user accounts; installing new software (commercial and public domain); maintaining system security; monitoring system usage; evaluating, recommending and/or negotiating specific hardware purchases. HCC systems are integrated with national computational grids; installation and maintenance of grid protocol software will be required.

    Supervision, distribution and coordination of tasks for graduate and/or undergraduate students often expected. Position works in cooperation with other University staff and researchers associated with research computing. Since this is a research oriented position, a curious, self-motivated individual who can operate successfully without procedural details already established is essential; good problem solving skills needed. Position requires frequent and effective communication of technical, operational and educational information in a variety of formats.

    Periodic travel between NU campuses (UNO, UNMC, UNL) is required. Criminal background check will be conducted. Excellent benefits including staff/dependent scholarship program. Applicant review begins April 30. View requisition S_140131 at https://employment.unl.edu for details and to apply. UNL is committed to a pluralistic campus community through affirmative action, equal opportunity, work-life balance, and dual careers.



  • 2014-04-16:  Globus Connect now available at HCC
    Category:  General Announcement

    We have implemented Globus Connect on both Crane and Tusker -- this is a transfer tool that facilitates high performance data transfers and data sharing. There are numerous details in an HCC document (link below). This is completely optional -- users need not change current practices, but here are a few reasons to give it a try:

    1) a graphical interface for quickly and easily transferring files between Tusker and Crane
    2) the ability to "fire and forget" transfers to Tusker and Crane (you will be emailed when the transfer is complete)
    3) the ability for HCC users to transfer files to/from Tusker or Crane with high performance

    Several more details may be found here:

    https://hcc-docs.unl.edu/display/HCCDOC/Globus+Connect

    All are encouraged to give it a try!

    Best regards,
    David Swanson


    We plan to hold several tutorials related to Globus Connect in the near future. If you would like one held at your department, please let me know. Globus Connect is now deployed at research institutions across the country, and is free of charge to NU researchers (HCC bears a small annual fee for this service).

    There are advanced features available to "Globus Plus" users -- if you are interested in trying these, please let us know (hcc-support@unl.edu). In particular, one can share files from laptop to laptop with these plans. A Plus account in general has the ability to readily share data with off-campus (non-HCC) collaborators -- and to control this access without the need for system administrator intervention. Globus PLUS accounts require a small fee (currently $7/month). HCC has access to several pre-paid accounts for this first year -- first come, first served.

  • 2014-04-15:  Permissions changes on /work recommended
    Category:  General Announcement

    HCC is implementing a new file transfer tool from Globus Online; this will ultimately make sharing of files on /work much easier (announcement coming in separate mailing).

    Preparing for this implementation, we've realized there are world read permissions on several /work directories on both Tusker and Crane. This allows anyone with access to a machine to read these files, and upon implementation of Globus Online, to make them widely available in practice.

    We've thus made two small (and reversible) changes since we expect this is not desired. New default permissions now only allow the group members of a /work/group directory to read it. In general, we recommend tightening existing permissions to only allow the owner of the files in /work/group/user (eg. /work/swanson/dswanson) to read these files. If you want us to do that for you, we will be happy to do so.

    You retain the ability to set permissions on your files as you desire; if you have questions or concerns, please contact us at hcc-support@unl.edu.

    David Swanson