1,050 research outputs found
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
In this paper, we present a novel cache design based on Multi-Level Cell
Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set
capacity and associativity to use efficiently the full potential of MLC STTRAM.
We exploit the asymmetric nature of the MLC storage scheme to build cache lines
featuring heterogeneous performances, that is, half of the cache lines are
read-friendly, while the other is write-friendly. Furthermore, we propose to
opportunistically deactivate ways in underutilized sets to convert MLC to
Single-Level Cell (SLC) mode, which features overall better performance and
lifetime. Our ultimate goal is to build a cache architecture that combines the
capacity advantages of MLC and performance/energy advantages of SLC. Our
experiments show an improvement of 43% in total numbers of conflict misses, 27%
in memory access latency, 12% in system performance, and 26% in LLC access
energy, with a slight degradation in cache lifetime (about 7%) compared to an
SLC cache
Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators
We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
- …