4,342 research outputs found
Havens: Explicit Reliable Memory Regions for HPC Applications
Supporting error resilience in future exascale-class supercomputing systems
is a critical challenge. Due to transistor scaling trends and increasing memory
density, scientific simulations are expected to experience more interruptions
caused by transient errors in the system memory. Existing hardware-based
detection and recovery techniques will be inadequate to manage the presence of
high memory fault rates.
In this paper we propose a partial memory protection scheme based on
region-based memory management. We define the concept of regions called havens
that provide fault protection for program objects. We provide reliability for
the regions through a software-based parity protection mechanism. Our approach
enables critical program objects to be placed in these havens. The fault
coverage provided by our approach is application agnostic, unlike
algorithm-based fault tolerance techniques.Comment: 2016 IEEE High Performance Extreme Computing Conference (HPEC '16),
September 2016, Waltham, MA, US
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
- …