Skip to main content
Article thumbnail
Location of Repository


By Al Geist (ornl, Marc Snir (anl, Eric Roman (lbnl, Bert Still (llnl, Robert Clay (snl, Christian Engelmann (ornl, Rob Ross (anl, Martin Schulz (llnl, Sriram Krishnamoorthy (pnnl, Bob Lucas (isi, Shekhar Borkar (intel, Mootaz Elnozahy (ibm, Andrew Chien (anl, John Wu (lbnl, Nathan Debardeleben (lanl, Larry Kaplan (cray, Mike Heroux (snl and Lucy Nowell (doe


at the BWI Airport Marriot hotel in Maryland. The goals of this workshop were to: 1. Describe the required HPC resilience for critical DOE mission needs 2. Detail what HPC resilience research is already being done at the DOE national laboratories and is expected to be done by industry or other groups 3. Determine what fault management research is a priority for DOE’s Office of Science and National Nuclear Security Administration (NNSA) over the next five years 4. Develop a roadmap for getting the necessary research accomplished in the timeframe when it will be needed by the large computing facilities across DO

Year: 2012
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.