Background
Memory, not processing, is the crux of the exascale co-design problem. Exascale machines will push the limits of memory capacity, power, and performance. DRAM, the universal memory technology of today, may not scale to meet the needs of exascale. Disk storage, critical for checkpointing and for archiving computational inputs and results, may also fail to provide adequate performance, reliability, and power efficiency. We confront a memory/storage crisis.
The Blackcomb effort seeks to create and understand new memory technologies, develop their roles in exascale systems, adapt applications to them, and assess their relative merits. We focus on emerging nonvolative memory (NVM) technologies, including spin-torque-transfer RAM (STT-RAM), phase-change RAM (PC-RAM), and memristor (resistive RAM, or R-RAM).
Objectives
 Understand and improve these new NVM technologies  Propose new architectures that address the resilience, energy, and performance needs of Exascale applications  Adopt the most promising NVM technologies  Flatten the memory hierarchy  Place low-power compute cores close to the data  Replace mechanical disk-based storage with energy-efficient NVM  Define programmer's APIs that expose nonvolatility and enable resilient applications that are easily written, energy efficient, and fast.
Approach
The project is structured around five tasks:
1. NVM Technology: identifies and characterizes the most promising NVM technologies. We will assess and improve wearout, error rate, durability, energy, latency. 2. Memory Architecture: explores the architecture space, considering how to assemble NVMs with a space of future processors, and also looks into the uses of NVM for resilience. 3. System Architecture: proposes a novel HPC system architecture. The idea is to explore the use of NVM as a singlelevel data store, co-located with ultra-low voltage processors and balanced network capability. This entails a designspace exploration of the various architecture options, as well as an analysis of the simplification and optimization of the software stack. 4. System and Runtime Software: identifies the most useful programming abstractions of the new NVM architectures.
We will look into new programming paradigms that can help to fully take advantage memory nonvolatility. We will re-examine the Message Passing Interface (MPI) and Partitioned Global Address Space (PGAS) programming models, and respective I/O models, such as MPI-IO and the Hierarchical Data Format (HDF5) in light of the new memory and storage architectures. 5. Applications: identifies, characterizes, and transforms key DoE applications for NVM. The results of the characterization will be made available to the other work packages to provide a quantitative basis for research decisions. New programming and other software techniques will be ported to the selected applications and tested. We will seek to understand the sensitivity of studied applications to faults.
Impact
 Better energy scalability: NV memories have zero standby power  Increase system reliability: MRAM/PCRAM are resilient to soft errors  Boost performance: NVM will be much faster than magnetic disk  Improve programmability and application fault tolerance with enhanced programming models
Challenges
 Understand and mitigate limitations of NVMs as a general purpose memory: higher write overheads and lower endurance than SRAM or DRAM  Need novel analytical/simulation hybrid model to understand tradeoffs between energy efficiency, resilience, and performance  Evaluate productivity of proposed programming models that exploit NVM to improve fault-tolerance of distributed applications
Research Products & Artifacts
 Identify and characterize a few key DOE applications, making results available to the other areas of the study to provide a quantitative basis for research decisions  Broadly explore opportunities for NVM memory in high performance computing, such as using active NVM for IO staging  Explore novel architectures that include NVM in the memory and storage hierarchies; investigate the bandwidth requirements for 3D-stacked and side-stacked NVM; develop peripheral circuitry design in NVM devices that can achieve higher memory density  Investigate NVM-based programming to improve resilience and productivity, including efficient checkpointing  Apply these new technologies to exascale proxy applications and test them on realistic datasets
Recent Publications

