3,789 research outputs found
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Modern DRAM cells are periodically refreshed to prevent data loss due to
leakage. Commodity DDR DRAM refreshes cells at the rank level. This degrades
performance significantly because it prevents an entire rank from serving
memory requests while being refreshed. DRAM designed for mobile platforms,
LPDDR DRAM, supports an enhanced mode, called per-bank refresh, that refreshes
cells at the bank level. This enables a bank to be accessed while another in
the same rank is being refreshed, alleviating part of the negative performance
impact of refreshes. However, there are two shortcomings of per-bank refresh.
First, the per-bank refresh scheduling scheme does not exploit the full
potential of overlapping refreshes with accesses across banks because it
restricts the banks to be refreshed in a sequential round-robin order. Second,
accesses to a bank that is being refreshed have to wait.
To mitigate the negative performance impact of DRAM refresh, we propose two
complementary mechanisms, DARP (Dynamic Access Refresh Parallelization) and
SARP (Subarray Access Refresh Parallelization). The goal is to address the
drawbacks of per-bank refresh by building more efficient techniques to
parallelize refreshes and accesses within DRAM. First, instead of issuing
per-bank refreshes in a round-robin order, DARP issues per-bank refreshes to
idle banks in an out-of-order manner. Furthermore, DARP schedules refreshes
during intervals when a batch of writes are draining to DRAM. Second, SARP
exploits the existence of mostly-independent subarrays within a bank. With
minor modifications to DRAM organization, it allows a bank to serve memory
accesses to an idle subarray while another subarray is being refreshed.
Extensive evaluations show that our mechanisms improve system performance and
energy efficiency compared to state-of-the-art refresh policies and the benefit
increases as DRAM density increases.Comment: The original paper published in the International Symposium on
High-Performance Computer Architecture (HPCA) contains an error. The arxiv
version has an erratum that describes the error and the fix for i
Main memory in HPC: do we need more, or could we live with less?
An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now.
This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Severo Ochoa programme (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union’s Horizon
2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Darko Zivanovic holds the Severo Ochoa grant (SVP-2014-068501) of the Ministry of Economy and Competitiveness
of Spain. The authors thank Harald Servat from BSC and Vladimir Marjanovi´c from High Performance Computing Center Stuttgart for their technical support.Postprint (published version
Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube
Three-dimensional (3D)-stacking technology, which enables the integration of
DRAM and logic dies, offers high bandwidth and low energy consumption. This
technology also empowers new memory designs for executing tasks not
traditionally associated with memories. A practical 3D-stacked memory is Hybrid
Memory Cube (HMC), which provides significant access bandwidth and low power
consumption in a small area. Although several studies have taken advantage of
the novel architecture of HMC, its characteristics in terms of latency and
bandwidth or their correlation with temperature and power consumption have not
been fully explored. This paper is the first, to the best of our knowledge, to
characterize the thermal behavior of HMC in a real environment using the AC-510
accelerator and to identify temperature as a new limitation for this
state-of-the-art design space. Moreover, besides bandwidth studies, we
deconstruct factors that contribute to latency and reveal their sources for
high- and low-load accesses. The results of this paper demonstrates essential
behaviors and performance bottlenecks for future explorations of
packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-
- …