743 research outputs found
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
In this paper, we present a novel cache design based on Multi-Level Cell
Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set
capacity and associativity to use efficiently the full potential of MLC STTRAM.
We exploit the asymmetric nature of the MLC storage scheme to build cache lines
featuring heterogeneous performances, that is, half of the cache lines are
read-friendly, while the other is write-friendly. Furthermore, we propose to
opportunistically deactivate ways in underutilized sets to convert MLC to
Single-Level Cell (SLC) mode, which features overall better performance and
lifetime. Our ultimate goal is to build a cache architecture that combines the
capacity advantages of MLC and performance/energy advantages of SLC. Our
experiments show an improvement of 43% in total numbers of conflict misses, 27%
in memory access latency, 12% in system performance, and 26% in LLC access
energy, with a slight degradation in cache lifetime (about 7%) compared to an
SLC cache
Improving Phase Change Memory Performance with Data Content Aware Access
A prominent characteristic of write operation in Phase-Change Memory (PCM) is
that its latency and energy are sensitive to the data to be written as well as
the content that is overwritten. We observe that overwriting unknown memory
content can incur significantly higher latency and energy compared to
overwriting known all-zeros or all-ones content. This is because all-zeros or
all-ones content is overwritten by programming the PCM cells only in one
direction, i.e., using either SET or RESET operations, not both. In this paper,
we propose data content aware PCM writes (DATACON), a new mechanism that
reduces the latency and energy of PCM writes by redirecting these requests to
overwrite memory locations containing all-zeros or all-ones. DATACON operates
in three steps. First, it estimates how much a PCM write access would benefit
from overwriting known content (e.g., all-zeros, or all-ones) by
comprehensively considering the number of set bits in the data to be written,
and the energy-latency trade-offs for SET and RESET operations in PCM. Second,
it translates the write address to a physical address within memory that
contains the best type of content to overwrite, and records this translation in
a table for future accesses. We exploit data access locality in workloads to
minimize the address translation overhead. Third, it re-initializes unused
memory locations with known all-zeros or all-ones content in a manner that does
not interfere with regular read and write accesses. DATACON overwrites unknown
content only when it is absolutely necessary to do so. We evaluate DATACON with
workloads from state-of-the-art machine learning applications, SPEC CPU2017,
and NAS Parallel Benchmarks. Results demonstrate that DATACON significantly
improves system performance and memory system energy consumption compared to
the best of performance-oriented state-of-the-art techniques.Comment: 18 pages, 21 figures, accepted at ACM SIGPLAN International Symposium
on Memory Management (ISMM
Exploiting Inter- and Intra-Memory Asymmetries for Data Mapping in Hybrid Tiered-Memories
Modern computing systems are embracing hybrid memory comprising of DRAM and
non-volatile memory (NVM) to combine the best properties of both memory
technologies, achieving low latency, high reliability, and high density. A
prominent characteristic of DRAM-NVM hybrid memory is that it has NVM access
latency much higher than DRAM access latency. We call this inter-memory
asymmetry. We observe that parasitic components on a long bitline are a major
source of high latency in both DRAM and NVM, and a significant factor
contributing to high-voltage operations in NVM, which impact their reliability.
We propose an architectural change, where each long bitline in DRAM and NVM is
split into two segments by an isolation transistor. One segment can be accessed
with lower latency and operating voltage than the other. By introducing tiers,
we enable non-uniform accesses within each memory type (which we call
intra-memory asymmetry), leading to performance and reliability trade-offs in
DRAM-NVM hybrid memory. We extend existing NVM-DRAM OS in three ways. First, we
exploit both inter- and intra-memory asymmetries to allocate and migrate memory
pages between the tiers in DRAM and NVM. Second, we improve the OS's page
allocation decisions by predicting the access intensity of a newly-referenced
memory page in a program and placing it to a matching tier during its initial
allocation. This minimizes page migrations during program execution, lowering
the performance overhead. Third, we propose a solution to migrate pages between
the tiers of the same memory without transferring data over the memory channel,
minimizing channel occupancy and improving performance. Our overall approach,
which we call MNEME, to enable and exploit asymmetries in DRAM-NVM hybrid
tiered memory improves both performance and reliability for both single-core
and multi-programmed workloads.Comment: 15 pages, 29 figures, accepted at ACM SIGPLAN International Symposium
on Memory Managemen
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
Aging-Aware Request Scheduling for Non-Volatile Main Memory
Modern computing systems are embracing non-volatile memory (NVM) to implement
high-capacity and low-cost main memory. Elevated operating voltages of NVM
accelerate the aging of CMOS transistors in the peripheral circuitry of each
memory bank. Aggressive device scaling increases power density and temperature,
which further accelerates aging, challenging the reliable operation of
NVM-based main memory. We propose HEBE, an architectural technique to mitigate
the circuit aging-related problems of NVM-based main memory. HEBE is built on
three contributions. First, we propose a new analytical model that can
dynamically track the aging in the peripheral circuitry of each memory bank
based on the bank's utilization. Second, we develop an intelligent memory
request scheduler that exploits this aging model at run time to de-stress the
peripheral circuitry of a memory bank only when its aging exceeds a critical
threshold. Third, we introduce an isolation transistor to decouple parts of a
peripheral circuit operating at different voltages, allowing the decoupled
logic blocks to undergo long-latency de-stress operations independently and off
the critical path of memory read and write accesses, improving performance. We
evaluate HEBE with workloads from the SPEC CPU2017 Benchmark suite. Our results
show that HEBE significantly improves both performance and lifetime of
NVM-based main memory.Comment: To appear in ASP-DAC 202
- …