73 research outputs found
DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips
To understand and improve DRAM performance, reliability, security and energy
efficiency, prior works study characteristics of commodity DRAM chips.
Unfortunately, state-of-the-art open source infrastructures capable of
conducting such studies are obsolete, poorly supported, or difficult to use, or
their inflexibility limit the types of studies they can conduct.
We propose DRAM Bender, a new FPGA-based infrastructure that enables
experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three
key features at the same time. First, DRAM Bender enables directly interfacing
with a DRAM chip through its low-level interface. This allows users to issue
DRAM commands in arbitrary order and with finer-grained time intervals compared
to other open source infrastructures. Second, DRAM Bender exposes easy-to-use
C++ and Python programming interfaces, allowing users to quickly and easily
develop different types of DRAM experiments. Third, DRAM Bender is easily
extensible. The modular design of DRAM Bender allows extending it to (i)
support existing and emerging DRAM interfaces, and (ii) run on new commercial
or custom FPGA boards with little effort.
To demonstrate that DRAM Bender is a versatile infrastructure, we conduct
three case studies, two of which lead to new observations about the DRAM
RowHammer vulnerability. In particular, we show that data patterns supported by
DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the
data patterns commonly used by prior work. We demonstrate the extensibility of
DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3
support. DRAM Bender is freely and openly available at
https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202
Retrospective: RAIDR: Retention-Aware Intelligent DRAM Refresh
Dynamic Random Access Memory (DRAM) is the prevalent memory technology used
to build main memory systems of almost all computers. A fundamental shortcoming
of DRAM is the need to refresh memory cells to keep stored data intact. DRAM
refresh consumes energy and degrades performance. It is also a technology
scaling challenge as its negative effects become worse as DRAM cell size
reduces and DRAM chip capacity increases.
Our ISCA 2012 paper, RAIDR, examines the DRAM refresh problem from a modern
computing systems perspective, demonstrating its projected impact on systems
with higher-capacity DRAM chips expected to be manufactured in the future. It
proposes and evaluates a simple and low-cost solution that greatly reduces the
performance & energy overheads of refresh by exploiting variation in data
retention times across DRAM rows. The key idea is to group the DRAM rows into
bins in terms of their minimum data retention times, store the bins in low-cost
Bloom filters, and refresh rows in different bins at different rates.
Evaluations in our paper (and later works) show that the idea greatly improves
performance & energy efficiency and its benefits increase with DRAM chip
capacity. The paper embodies an approach we have termed system-DRAM co-design.
This short retrospective provides a brief analysis of our RAIDR paper and its
impact. We briefly describe the mindset and circumstances that led to our focus
on the DRAM refresh problem and RAIDR's development, discuss later works that
provided improved analyses and solutions, and make some educated guesses on
what the future may bring on the DRAM refresh problem (and more generally in
DRAM technology scaling).Comment: Selected to the 50th Anniversary of ISCA (ACM/IEEE International
Symposium on Computer Architecture), Commemorative Issue, 202
Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Design Space Exploration and Resource Management of Multi/Many-Core Systems
The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Data movement between the CPU and main memory is a first-order obstacle
against improving performance, scalability, and energy efficiency in modern
systems. Computer systems employ a range of techniques to reduce overheads tied
to data movement, spanning from traditional mechanisms (e.g., deep multi-level
cache hierarchies, aggressive hardware prefetchers) to emerging techniques such
as Near-Data Processing (NDP), where some computation is moved close to memory.
Our goal is to methodically identify potential sources of data movement over a
broad set of applications and to comprehensively compare traditional
compute-centric data movement mitigation techniques to more memory-centric
techniques, thereby developing a rigorous understanding of the best techniques
to mitigate each source of data movement.
With this goal in mind, we perform the first large-scale characterization of
a wide variety of applications, across a wide range of application domains, to
identify fundamental program properties that lead to data movement to/from main
memory. We develop the first systematic methodology to classify applications
based on the sources contributing to data movement bottlenecks. From our
large-scale characterization of 77K functions across 345 applications, we
select 144 functions to form the first open-source benchmark suite (DAMOV) for
main memory data movement studies. We select a diverse range of functions that
(1) represent different types of data movement bottlenecks, and (2) come from a
wide range of application domains. Using NDP as a case study, we identify new
insights about the different data movement bottlenecks and use these insights
to determine the most suitable data movement mitigation mechanism for a
particular application. We open-source DAMOV and the complete source code for
our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at
https://github.com/CMU-SAFARI/DAMO
Security Analysis of the Silver Bullet Technique for RowHammer Prevention
The purpose of this document is to study the security properties of the
Silver Bullet algorithm against worst-case RowHammer attacks. We mathematically
demonstrate that Silver Bullet, when properly configured and implemented in a
DRAM chip, can securely prevent RowHammer attacks. The demonstration focuses on
the most representative implementation of Silver Bullet, the patent claiming
many implementation possibilities not covered in this demonstration. Our study
concludes that Silver Bullet is a promising RowHammer prevention mechanism that
can be configured to operate securely against RowHammer attacks at various
efficiency-area tradeoff points, supporting relatively small hammer count
values (e.g., 1000) and Silver Bullet table sizes (e.g., 1.06KB).Comment: 40 page
Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
Many modern workloads, such as neural networks, databases, and graph
processing, are fundamentally memory-bound. For such workloads, the data
movement between main memory and CPU cores imposes a significant overhead in
terms of both latency and energy. A major reason is that this communication
happens through a narrow bus with high latency and limited bandwidth, and the
low data reuse in memory-bound workloads is insufficient to amortize the cost
of main memory access. Fundamentally addressing this data movement bottleneck
requires a paradigm where the memory system assumes an active role in computing
by integrating processing capabilities. This paradigm is known as
processing-in-memory (PIM).
Recent research explores different forms of PIM architectures, motivated by
the emergence of new 3D-stacked memory technologies that integrate memory with
a logic layer where processing elements can be easily placed. Past works
evaluate these architectures in simulation or, at best, with simplified
hardware prototypes. In contrast, the UPMEM company has designed and
manufactured the first publicly-available real-world PIM architecture.
This paper provides the first comprehensive analysis of the first
publicly-available real-world PIM architecture. We make two key contributions.
First, we conduct an experimental characterization of the UPMEM-based PIM
system using microbenchmarks to assess various architecture limits such as
compute throughput and memory bandwidth, yielding new insights. Second, we
present PrIM, a benchmark suite of 16 workloads from different application
domains (e.g., linear algebra, databases, graph processing, neural networks,
bioinformatics).Comment: Our open source software is available at
https://github.com/CMU-SAFARI/prim-benchmark
Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study
The number and diversity of consumer devices are growing rapidly, alongside
their target applications' memory consumption. Unfortunately, DRAM scalability
is becoming a limiting factor to the available memory capacity in consumer
devices. As a potential solution, manufacturers have introduced emerging
non-volatile memories (NVMs) into the market, which can be used to increase the
memory capacity of consumer devices by augmenting or replacing DRAM. Since
entirely replacing DRAM with NVM in consumer devices imposes large system
integration and design challenges, recent works propose extending the total
main memory space available to applications by using NVM as swap space for
DRAM. However, no prior work analyzes the implications of enabling a real
NVM-based swap space in real consumer devices.
In this work, we provide the first analysis of the impact of extending the
main memory space of consumer devices using off-the-shelf NVMs. We extensively
examine system performance and energy consumption when the NVM device is used
as swap space for DRAM main memory to effectively extend the main memory
capacity. For our analyses, we equip real web-based Chromebook computers with
the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD
device. We compare the performance and energy consumption of interactive
workloads running on our Chromebook with NVM-based swap space, where the Intel
Optane SSD capacity is used as swap space to extend main memory capacity,
against two state-of-the-art systems: (i) a baseline system with double the
amount of DRAM than the system with the NVM-based swap space; and (ii) a system
where the Intel Optane SSD is naively replaced with a state-of-the-art (yet
slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of
equivalent size as the NVM-based swap space
- …