Search CORE

55 research outputs found

ELECTRICAL CHARACTERIZATION, PHYSICS, MODELING AND RELIABILITY OF INNOVATIVE NON-VOLATILE MEMORIES

Author: Zambelli Cristian
Publication venue
Publication date: 01/01/2012
Field of study

Enclosed in this thesis work it can be found the results of a three years long research activity performed during the XXIV-th cycle of the Ph.D. school in Engineering Science of the Università degli Studi di Ferrara. The topic of this work is concerned about the electrical characterization, physics, modeling and reliability of innovative non-volatile memories, addressing most of the proposed alternative to the floating-gate based memories which currently are facing a technology dead end. Throughout the chapters of this thesis it will be provided a detailed characterization of the envisioned replacements for the common NOR and NAND Flash technologies into the near future embedded and MPSoCs (Multi Processing System on Chip) systems. In Chapter 1 it will be introduced the non-volatile memory technology with direct reference on nowadays Flash mainstream, providing indications and comments on why the system designers should be forced to change the approach to new memory concepts. In Chapter 2 it will be presented one of the most studied post-floating gate memory technology for MPSoCs: the Phase Change Memory. The results of an extensive electrical characterization performed on these devices led to important discoveries such as the kinematics of the erase operation and potential reliability threats in memory operations. A modeling framework has been developed to support the experimental results and to validate them on projected scaled technology. In Chapter 3 an embedded memory for automotive environment will be shown: the SimpleEE p-channel memory. The characterization of this memory proven the technology robustness providing at the same time new insights on the erratic bits phenomenon largely studied on NOR and NAND counterparts. Chapter 4 will show the research studies performed on a memory device based on the Nano-MEMS concept. This particular memory generation proves to be integrated in very harsh environment such as military applications, geothermal and space avionics. A detailed study on the physical principles underlying this memory will be presented. In Chapter 5 a successor of the standard NAND Flash will be analyzed: the Charge Trapping NAND. This kind of memory shares the same principles of the traditional floating gate technology except for the storage medium which now has been substituted by a discrete nature storage (i.e. silicon nitride traps). The conclusions and the results summary for each memory technology will be provided in Chapter 6. Finally, on Appendix A it will be shown the results of a recently started research activity on the high level reliability memory management exploiting the results of the studies for Phase Change Memories

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

A Modern Primer on Processing in Memory

Author: Ausavarungnirun Rachata
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Publication venue
Publication date: 05/12/2020
Field of study

Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

arXiv.org e-Print Archive

MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing

Author: Tatulian Adrian
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Recommended from our members

Continuous-Time and Companding Digital Signal Processors Using Adaptivity and Asynchronous Techniques

Author: Vezyrtzis Christos
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

The fully synchronous approach has been the norm for digital signal processors (DSPs) for many decades. Due to its simplicity, the classical DSP structure has been used in many applications. However, due to its rigid discrete-time operation, a classical DSP has limited efficiency or inadequate resolution for some emerging applications, such as processing of multimedia and biological signals. This thesis proposes fundamentally new approaches to designing DSPs, which are different from the classical scheme. The defining characteristic of all new DSPs examined in this thesis is the notion of "adaptivity" or "adaptability." Adaptive DSPs dynamically change their behavior to adjust to some property of their input stream, for example the rate of change of the input. This thesis presents both enhancements to existing adaptive DSPs, as well as new adaptive DSPs. The main class of DSPs that are examined throughout the thesis are continuous-time (CT) DSPs. CT DSPs are clock-less and event-driven; they naturally adapt their activity and power consumption to the rate of their inputs. The absence of a clock also provides a complete avoidance of aliasing in the frequency domain, hence improved signal fidelity. The core of this thesis deals with the complete and systematic design of a truly general-purpose CT DSP. A scalable design methodology for CT DSPs is presented. This leads to the main contribution of this thesis, namely a new CT DSP chip. This chip is the first general-purpose CT DSP chip, able to process many different classes of CT and synchronous signals. The chip has the property of handling various types of signals, i.e. various different digital modulations, both synchronous and asynchronous, without requiring any reconfiguration; such property is presented for the first time CT DSPs and is impossible for classical DSPs. As opposed to previous CT DSPs, which were limited to using only one type of digital format, and whose design was hard to scale for different bandwidths and bit-widths, this chip has a formal, robust and scalable design, due to the systematic usage of asynchronous design techniques. The second contribution of this thesis is a complete methodology to design adaptive delay lines. In particular, it is shown how to make the granularity, i.e. the number of stages, adaptive in a real-time delay line. Adaptive granularity brings about a significant improvement in the line's power consumption, up to 70% as reported by simulations on two design examples. This enhancement can have a direct large power impact on any CT DSP, since a delay line consumes the majority of a CT DSP's power. The robust methodology presented in this thesis allows safe dynamic reconfiguration of the line's granularity, on-the-fly and according to the input traffic. As a final contribution, the thesis also examines two additional DSPs: one operating the CT domain and one using the companding technique. The former operates only on level-crossing samples; the proposed methodology shows a potential for high-quality outputs by using a complex interpolation function. Finally, a companding DSP is presented for MPEG audio. Companding DSPs adapt their dynamic range to the amplitude of their input; the resulting can offer high-quality outputs even for small inputs. By applying companding to MPEG DSPs, it is shown how the DSP distortion can be made almost inaudible, without requiring complex arithmetic hardware

Columbia University Academic Commons

RAID Organizations for Improved Reliability and Performance: A Not Entirely Unbiased Tutorial (1st revision)

Author: Thomasian Alexander
Publication venue
Publication date: 06/01/2024
Field of study

RAID proposal advocated replacing large disks with arrays of PC disks, but as the capacity of small disks increased 100-fold in 1990s the production of large disks was discontinued. Storage dependability is increased via replication or erasure coding. Cloud storage providers store multiple copies of data obviating for need for further redundancy. Varitaions of RAID based on local recovery codes, partial MDS reduce recovery cost. NAND flash Solid State Disks - SSDs have low latency and high bandwidth, are more reliable, consume less power and have a lower TCO than Hard Disk Drives, which are more viable for hyperscalers.Comment: Submitted to ACM Computing Surveys. arXiv admin note: substantial text overlap with arXiv:2306.0876

arXiv.org e-Print Archive

Algorithm and Hardware Co-design for Learning On-a-chip

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Magnetic racetrack memory: from physics to the cusp of applications within a decade

Author: Bläsing R.
Castrillon J.
Filippou P.
Garg C.
Hameed F.
Khan A.
Parkin S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2020
Field of study

Racetrack memory (RTM) is a novel spintronic memory-storage technology that has the potential to overcome fundamental constraints of existing memory and storage devices. It is unique in that its core differentiating feature is the movement of data, which is composed of magnetic domain walls (DWs), by short current pulses. This enables more data to be stored per unit area compared to any other current technologies. On the one hand, RTM has the potential for mass data storage with unlimited endurance using considerably less energy than today's technologies. On the other hand, RTM promises an ultrafast nonvolatile memory competitive with static random access memory (SRAM) but with a much smaller footprint. During the last decade, the discovery of novel physical mechanisms to operate RTM has led to a major enhancement in the efficiency with which nanoscopic, chiral DWs can be manipulated. New materials and artificially atomically engineered thin-film structures have been found to increase the speed and lower the threshold current with which the data bits can be manipulated. With these recent developments, RTM has attracted the attention of the computer architecture community that has evaluated the use of RTM at various levels in the memory stack. Recent studies advocate RTM as a promising compromise between, on the one hand, power-hungry, volatile memories and, on the other hand, slow, nonvolatile storage. By optimizing the memory subsystem, significant performance improvements can be achieved, enabling a new era of cache, graphical processing units, and high capacity memory devices. In this article, we provide an overview of the major developments of RTM technology from both the physics and computer architecture perspectives over the past decade. We identify the remaining challenges and give an outlook on its future

MPG.PuRe

Enabling Recovery of Secure Non-Volatile Memories

Author: Ye Mao
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2020
Field of study

Emerging non-volatile memories (NVMs), such as phase change memory (PCM), spin-transfer torque RAM (STT-RAM) and resistive RAM (ReRAM), have dual memory-storage characteristics and, therefore, are strong candidates to replace or augment current DRAM and secondary storage devices. The newly released Intel 3D XPoint persistent memory and Optane SSD series have shown promising features. However, when these new devices are exposed to events such as power loss, many issues arise when data recovery is expected. In this dissertation, I devised multiple schemes to enable secure data recovery for emerging NVM technologies when memory encryption is used. With the data-remanence feature of NVMs, physical attacks become easier; hence, emerging NVMs are typically paired with encryption. In particular, counter-mode encryption is commonly used due to its performance and security advantages over other schemes (e.g., electronic codebook encryption). However, enabling data recovery in power failure events requires the recovery of security metadata associated with data blocks. Naively writing security metadata updates along with data for each operation can further exacerbate the write endurance problem of NVMs as they have limited write endurance and very slow write operations. Therefore, it is necessary to enable the recovery of data and security metadata (encryption counters) but without incurring a significant number of writes. The first work of this dissertation presents an explanation of Osiris, a novel mechanism that repurposes error correcting code (ECC) co-located with data to enable recovery of encryption counters by additionally serving as a sanity-check for encryption counters used. Thus, by using a stop-loss mechanism with a limited number of trials, ECC can be used to identify which encryption counter that was used most recently to encrypt the data and, hence, allow correct decryption and recovery. The first work of this dissertation explores how different stop-loss parameters along with optimizations of Osiris can potentially reduce the number of writes. Overall, Osiris enables the recovery of encryption counters while achieving better performance and fewer writes than a conventional write-back caching scheme of encryption counters, which lacks the ability to recover encryption counters. Later, in the second work, Osiris implementation is expanded to work with different counter-mode memory encryption schemes, where we use an epoch-based approach to periodically persist updated counters. Later, when a crash occurs, we can recover counters through test-and-verification to identify the correct counter within the size of an epoch for counter recovery. Our proposed scheme, Osiris-Global, incurs minimal performance overheads and write overheads in enabling the recovery of encryption counters. In summary, the findings of the present PhD work enable the recovery of secure NVM systems and, hence, allows persistent applications to leverage the persistency features of NVMs. Meanwhile, it also minimizes the number of writes required in meeting this crash consistency requirement of secure NVM systems

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

EFFICIENT SECURITY IN EMERGING MEMORIES

Author: Rakshit Joydeep
Publication venue
Publication date: 25/09/2018
Field of study

The wide adoption of cloud computing has established integrity and confidentiality of data in memory as a first order design concern in modern computing systems. Data integrity is ensured by Merkle Tree (MT) memory authentication. However, in the context of emerging non-volatile memories (NVMs), the MT memory authentication related increase in cell writes and memory accesses impose significant energy, lifetime, and performance overheads. This dissertation presents ASSURE, an Authentication Scheme for SecURE (ASSURE) energy efficient NVMs. ASSURE integrates (i) smart message authentication codes with (ii) multi-root MTs to decrease MT reads and writes, while also reducing the number of cell writes on each MT write. Whereas data confidentiality is effectively ensured by encryption, the memory access patterns can be exploited as a side-channel to obtain confidential data. Oblivious RAM (ORAM) is a secure cryptographic construct that effectively thwarts access-pattern-based attacks. However, in Path ORAM (state-of-the-art efficient ORAM for main memories) and its variants, each last-level cache miss (read or write) is transformed to a sequence of memory reads and writes (collectively termed read phase and write phase, respectively), increasing the number of memory writes due to data re-encryption, increasing effective latency of the memory accesses, and degrading system performance. This dissertation efficiently addresses the challenges of both read and write phase operations during an ORAM access. First, it presents ReadPRO (Read Promotion), which is an efficient ORAM scheduler that leverages runtime identification of read accesses to effectively prioritize the service of critical-path-bound read access read phase operations, while preserving all data dependencies. Second, it presents LEO (Low overhead Encryption ORAM) that reduces cell writes by opportunistically decreasing the number of block encryptions, while preserving the security guarantees of the baseline Path ORAM. This dissertation therefore addresses the core chal- lenges of read/write energy and latency, endurance, and system performance for integration of essential security primitives in emerging memory architectures. Future research directions will focus on (i) exploring efficient solutions for ORAM read phase optimization and secure ORAM resizing, (ii) investigating the security challenges of emerging processing-in-memory architectures, and (iii) investigating the interplay of security primitives with reliability enhancing architectures

D-Scholarship@Pitt

Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

Author: Najafabadi Navid Khoshavi
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2017
Field of study

While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)