Search CORE

8 research outputs found

Low Energy Solutions for Multi- and Triple-Level Cell Non-Volatile Memories

Author: Alsuwaiyan Ali
Publication venue
Publication date: 26/09/2017
Field of study

Due to the high refresh power and scalability issues of DRAM, non-volatile memories (NVM) such as phase change memory (PCM) and resistive RAM (RRAM) are being actively investigated as viable replacements of DRAM. However, although these NVMs are more scalable than DRAM, they have shortcomings such as higher write energy and lower endurance. Further, the increased capacity of multi- and triple-level cells (MLC/TLC) in these NVM technologies comes at the cost of even higher write energies and lower endurance attributed to the MLC/TLC program-and-verify (P&V) techniques. This dissertation makes the following contributions to address the high write energy associated with MLC/TLC NVMs. First, we describe MFNW, a Flip-N-Write encoding that effectively reduces the write energy and improves the endurance of MLC NVMs. MFNW encodes an MLC/TLC word into a number of codewords and selects the one resulting in lowest write energy. Second, we present another encoding solution that is based on perfect knowledge frequent value encoding (FVE). This encoding technique leverages machine learning to cluster a set of general-purpose applications according to their frequency profiles and generates a dedicated offline FVE for every cluster to maximize energy reduction across a broad spectrum of applications. Whereas the proposed encodings are used as an add-on layer on top of the MLC/TLC P&V solutions, the third contribution is a low latency, low energy P&V (L3EP) approach for MLC/TLC PCM. The primary motivation of L^3EP is to fix the problem from its origin by crafting a higher speed programming algorithm. A reduction in write latency implies a reduction in write energy as well as an improvement in cell endurance. Directions for future research include the integration and evaluation of a software-based hybrid encoding mechanism for MLC/TLC NVMs; this is a page-level encoding that employs a DRAM cache for coding/decoding purposes. The main challenges include how the cache block replacement algorithm can easily access the page-level auxiliary cells to encode the cache block correctly. In summary, this work presents multiple solutions to address major challenges of MLC/TLC NVMs, including write latency, write energy, and cell endurance

D-Scholarship@Pitt

Anchor: Architecture for Secure Non-Volatile Memories

Author: Swami Shivam
Publication venue
Publication date: 25/09/2018
Field of study

The rapid growth of memory-intensive applications like cloud computing, deep learning, bioinformatics, etc., have propelled memory industry to develop scalable, high density, low power non-volatile memory (NVM) technologies; however, computing systems that integrate these advanced NVMs are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This dissertation presents ANCHOR, which integrates 4 low overhead, high performance security solutions SECRET, COVERT, ACME, and STASH to thwart these attacks on NVM systems. SECRET is a low cost security solution for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. SECRET synergistically combines (i) smart encryption, which prevents re-encryption of unmodified or zero-words during a write-back with (ii) XOR-based energy masking, which further optimizes NVM writes by transforming a high-energy ciphertext into a low-energy ciphertext. SECRET outperforms state-of-the-art encryption solutions, with the lowest write energy and latency, as well as the highest lifetime. COVERT and ACME complement SECRET to improve system availability of counter mode encryption (CME). COVERT repurposes unused error correction resources to dynamically extend time to counter overflow of fast growing counters, thereby delaying frequent full memory re-encryption (system freeze). ACME performs counter write leveling (CWL) to further increase time to counter overflow, and thereby delays the time to full memory re-encryption. COVERT+ACME achieves system availability of 99.999% during normal operation and 99.9% under a denial of memory service (DoMS) attack. In contrast, conventional CME achieves system availability of only 85.71% during normal operation and is rendered non-operational under a DoMS attack. Finally, STASH is a comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs) that employ a smart DRAM cache with smart NVM-based main memory. STASH integrates (i) CME for data confidentiality, (ii) page-level Merkle Tree authentication for data integrity, (iii) recovery-compatible MT updates to withstand power/system failures, and (iv) page-migration friendly security meta-data management. For security guarantees equivalent to state-of-the-art, STASH reduces memory overhead by 12.7x, improves system performance by 65%, and increases NVM lifetime by 5x. This dissertation thus addresses the core security challenges of next-generation NVM-based memory systems. Directions for future research include (i) exploration of holistic architectures that ensure both security and reliability of smart memory systems, (ii) investigating applications of ANCHOR to reduce security overhead of Internet-of-Things, and (iii) extending ANCHOR to safeguard emerging non-volatile processors, especially in the light of advanced attacks like Spectre and Meltdown

D-Scholarship@Pitt

Anchor: Architecture for Secure Non-Volatile Memories

Author: Swami Shivam
Publication venue
Publication date: 01/01/1711
Field of study

D-Scholarship@Pitt

Galiciana

EFFICIENT SECURITY IN EMERGING MEMORIES

Author: Rakshit Joydeep
Publication venue
Publication date: 25/09/2018
Field of study

The wide adoption of cloud computing has established integrity and confidentiality of data in memory as a first order design concern in modern computing systems. Data integrity is ensured by Merkle Tree (MT) memory authentication. However, in the context of emerging non-volatile memories (NVMs), the MT memory authentication related increase in cell writes and memory accesses impose significant energy, lifetime, and performance overheads. This dissertation presents ASSURE, an Authentication Scheme for SecURE (ASSURE) energy efficient NVMs. ASSURE integrates (i) smart message authentication codes with (ii) multi-root MTs to decrease MT reads and writes, while also reducing the number of cell writes on each MT write. Whereas data confidentiality is effectively ensured by encryption, the memory access patterns can be exploited as a side-channel to obtain confidential data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. However, in Path ORAM (state-of-the-art efficient ORAM for main memories) and its variants, each last-level cache miss (read or write) is transformed to a sequence of memory reads and writes (collectively termed read phase and write phase, respectively), increasing the number of memory writes due to data re-encryption, increasing effective latency of the memory accesses, and degrading system performance. This dissertation efficiently addresses the challenges of both read and write phase operations during an ORAM access. First, it presents ReadPRO (Read Promotion), which is an efficient ORAM scheduler that leverages runtime identification of read accesses to effectively prioritize the service of critical-path-bound read access read phase operations, while preserving all data dependencies. Second, it presents LEO (Low overhead Encryption ORAM) that reduces cell writes by opportunistically decreasing the number of block encryptions, while preserving the security guarantees of the baseline Path ORAM. This dissertation therefore addresses the core chal- lenges of read/write energy and latency, endurance, and system performance for integration of essential security primitives in emerging memory architectures. Future research directions will focus on (i) exploring efficient solutions for ORAM read phase optimization and secure ORAM resizing, (ii) investigating the security challenges of emerging processing-in-memory architectures, and (iii) investigating the interplay of security primitives with reliability enhancing architectures

D-Scholarship@Pitt

Architectures for Low Energy, Low Latency, High Performance, Durable Multi-/Triple-Level Cell Non-Volatile Memories

Author: Palangappa Poovaiah Manavattira
Publication venue
Publication date: 14/06/2017
Field of study

Multi-level/triple-level cell non-volatile memories (MLC/TLC NVMs) such as phase-change memory and resistive RAM are the potential replacement candidates for DRAM, which is limited by its high refresh power and poor scaling potential. Besides the benefits of non-volatility (low refresh power) and improved scalability, MLC/TLC NVMs offer high data density and memory capacity over DRAM. However, the viability of MLC/TLC NVMs is limited due to (i) high programming energy/latency and low endurance, (ii) security vulnerability due to non-volatility, and (iii) high read latency. This dissertation presents three architectures for low energy, low latency, high performance, durable MLC/TLC NVMs. First, it presents CompEx/CompEx++ coding, a low overhead scheme that synergistically integrates pattern-based compression with linear block expansion coding to realize simultaneous energy, latency, and lifetime improvements in MLC/TLC NVMs. Second, it presents CASTLE, a Compression-based read-decrypt-free Architecture that provides a read-decrypt-free block-level Secure solution for low laTency, Low Energy durable NVMs. At its core, CASTLE adopts a block-level write-only sequence to eliminate the latency of the read-decrypt steps in state-of-the-art NVM security solutions. Whereas a write-only approach increases cell updates, and thereby energy and latency, CASTLE in- tegrates pattern-based compression and expansion coding to realize energy reductions and lifetime improvements over state-of-the-art. Third, it presents RAPID, a no-overhead critical-word-first read acceleration architecture for improved performance and durability in MLC/TLC NVMs. At its core, RAPID encodes the critical words in a cache line using only the most significant bits (MSbs) of the MLC/TLCs. Since the MSbs of an NVM cell can be decoded using a single read strobe, the data (i.e., critical words) encoded using the MSbs can be decoded with low latency. This dissertation thus addresses the core challenges of write/read energy and latency, endurance, and security of MLC/TLC NVMs and proposes multiple solutions to these challenges

D-Scholarship@Pitt

Vers la Compression à Tous les Niveaux de la Hiérarchie de la Mémoire

Author: Rodrigues Carvalho Daniel
Publication venue: HAL CCSD
Publication date: 09/04/2021
Field of study

Hardware compression techniques are typically simplifications of software compression methods. They must, however, comply with area, power and latency constraints. This study unveils the challenges of adopting compression in memory design. The goal of this analysis is not to summarize proposals, but to put in evidence the solutions they employ to handle those challenges. An in-depth description of the main characteristics of multiple methods is provided, as well as criteria that can be used as a basis for the assessment of such schemes.Typically, these schemes are not very efficient, and those that do compress well decompress slowly. This work explores their granularity to redefine their perspectives and improve their efficiency, through a concept called Region-Chunk compression. Its goal is to achieve low (good) compression ratio and fast decompression latency. The key observation is that by further sub-dividing the chunks of data being compressed one can reduce data duplication. This concept can be applied to several previously proposed compressors, resulting in a reduction of their average compressed size. In particular, a single-cycle-decompression compressor is boosted to reach a compressibility level competitive to state-of-the-art proposals.Finally, to increase the probability of successfully co-allocating compressed lines, Pairwise Space Sharing (PSS) is proposed. PSS can be applied orthogonally to compaction methods at no extra latency penalty, and with a cost-effective metadata overhead. The proposed system (Region-Chunk+PSS) further enhances the normalized average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.Les techniques de compression matérielle sont généralement des simplifications des méthodes de compression logicielle. Elles doivent, toutefois, se conformer aux contraintes de surface, de puissance et de latence. Cette étude dévoile les défis de l’adoption de la compression dans la conception de la mémoire. Le but de l’analyse n’est pas de résumer les propositions, mais de mettre en évidence les solutions qu’ils emploient pour relever ces défis. Une description détaillée des principales caractéristiques de plusieurs méthodes est fournie, ainsi que des critères qui peuvent être utilisés comme base pour l’évaluation de ces systèmes.Généralement, ces schémas ne sont pas très efficaces, et les schémas qui compressent bien décompressent lentement. Ce travail explore leur granularité pour redéfinir leurs perspectives et améliorer leur efficacité, à travers un concept appelé compression Region-Chunk. Son objectif est d’obtenir un haut (bon) taux de compression et une latence de décompression rapide. L’observation clé est qu’en subdivisant davantage les blocs de données compressés, on peut réduire la duplication des données. Ce concept peut être appliqué à plusieurs compresseurs précédemment proposés, entraînant une réduction de leur taille moyenne compressée. En particulier, un compresseur à décompression à cycle unique est boosté pour atteindre un niveau de compressibilité compétitif par rapport aux propositions de pointe.Enfin, pour augmenter la probabilité de co-allouer avec succès des lignes compressées, Pairwise Space Sharing (PSS) est proposé. PSS peutêtre appliqué orthogonalement aux méthodes de compactage sans pénalité de latence supplémentaire, et avec une surcharge de métadonnées rentable. Le système proposé (Region-Chunk + PSS) améliore encore la capacité normalisé moyenne du cache de 2,7% (moyenne géométrique), tout en offrant une courte latence de décompression

INRIA a CCSD electronic archive server