25 research outputs found

    Architectural Techniques to Enable Reliable and Scalable Memory Systems

    Get PDF
    High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. This dissertation describes how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability for memory systems while incurring negligible overheads.Comment: PhD thesis, Georgia Institute of Technology (May 2017

    A design methodology for robust, energy-efficient, application-aware memory systems

    Get PDF
    Memory design is a crucial component of VLSI system design from area, power and performance perspectives. To meet the increasingly challenging system specifications, architecture, circuit and device level innovations are required for existing memory technologies. Emerging memory solutions are widely explored to cater to strict budgets. This thesis presents design methodologies for custom memory design with the objective of power-performance benefits across specific applications. Taking example of STTRAM (spin transfer torque random access memory) as an emerging memory candidate, the design space is explored to find optimal energy design solution. A thorough thermal reliability study is performed to estimate detection reliability challenges and circuit solutions are proposed to ensure reliable operation. Adoption of the application-specific optimal energy solution is shown to yield considerable energy benefits in a read-heavy application called MBC (memory based computing). Circuit level customizations are studied for the volatile SRAM (static random access memory) memory, which will provide improved energy-delay product (EDP) for the same MBC application. Memory design has to be aware of upcoming challenges from not only the application nature but also from the packaging front. Taking 3D die-folding as an example, SRAM performance shift under die-folding is illustrated. Overall the thesis demonstrates how knowledge of the system and packaging can help in achieving power efficient and high performance memory design.Ph.D

    Variation Analysis, Fault Modeling and Yield Improvement of Emerging Spintronic Memories

    Get PDF

    An Experimental Analysis of RowHammer in HBM2 DRAM Chips

    Full text link
    RowHammer (RH) is a significant and worsening security, safety, and reliability issue of modern DRAM chips that can be exploited to break memory isolation. Therefore, it is important to understand real DRAM chips' RH characteristics. Unfortunately, no prior work extensively studies the RH vulnerability of modern 3D-stacked high-bandwidth memory (HBM) chips, which are commonly used in modern GPUs. In this work, we experimentally characterize the RH vulnerability of a real HBM2 DRAM chip. We show that 1) different 3D-stacked channels of HBM2 memory exhibit significantly different levels of RH vulnerability (up to 79% difference in bit error rate), 2) the DRAM rows at the end of a DRAM bank (rows with the highest addresses) exhibit significantly fewer RH bitflips than other rows, and 3) a modern HBM2 DRAM chip implements undisclosed RH defenses that are triggered by periodic refresh operations. We describe the implications of our observations on future RH attacks and defenses and discuss future work for understanding RH in 3D-stacked memories.Comment: To appear at DSN Disrupt 202

    Hardware/Software Co-Design of Ultra-Low Power Biomedical Monitors

    Get PDF
    Ongoing changes in world demographics and the prevalence of unhealthy lifestyles are imposing a paradigm shift in healthcare delivery. Nowadays, chronic ailments such as cardiovascular diseases, hypertension and diabetes, represent the most common causes of death according to the World Health Organization. It is estimated that 63% of deaths worldwide are directly or indirectly related to these non-communicable diseases (NCDs), and by 2030 it is predicted that the health delivery cost will reach an amount comparable to 75% of the current GDP. In this context, technologies based on Wireless Sensor Nodes (WSNs) effectively alleviate this burden enabling the conception of wearable biomedical monitors composed of one or several devices connected through a Wireless Body Sensor Network (WBSN). Energy efficiency is of paramount importance for these devices, which must operate for prolonged periods of time with a single battery charge. In this thesis I propose a set of hardware/software co-design techniques to drastically increase the energy efficiency of bio-medical monitors. To this end, I jointly explore different alternatives to reduce the required computational effort at the software level while optimizing the power consumption of the processing hardware by employing ultra-low power multi-core architectures that exploit DSP application characteristics. First, at the sensor level, I study the utilization of a heartbeat classifier to perform selective advanced DSP on state-of-the-art ECG bio-medical monitors. To this end, I developed a framework to design and train real-time, lightweight heartbeat neuro-fuzzy classifiers, detail- ing the required optimizations to efficiently execute them on a resource-constrained platform. Then, at the network level I propose a more complex transmission-aware WBSN for activity monitoring that provides different tradeoffs between classification accuracy and transmission volume. In this work, I study the combination of a minimal set of WSNs with a smartphone, and propose two classification schemes that trade accuracy for transmission volume. The proposed method can achieve accuracies ranging from 88% to 97% and can save up to 86% of wireless transmissions, outperforming the state-of-the-art alternatives. Second, I propose a synchronization-based low-power multi-core architecture for bio-signal processing. I introduce a hardware/software synchronization mechanism that allows to achieve high energy efficiency while parallelizing the execution of multi-channel DSP applications. Then, I generalize the methodology to support bio-signal processing applications with an arbitrarily high degree of parallelism. Due to the benefits of SIMD execution and software pipelining, the architecture can reduce its power consumption by up 38% when compared to an equivalent low-power single-core alternative. Finally, I focused on the optimization of the multi-core memory subsystem, which is the major contributor to the overall system power consumption. First I considered a hybrid memory subsystem featuring a small reliable partition that can operate at ultra-low voltage enabling low-power buffering of data and obtaining up to 50% energy savings. Second, I explore a two-level memory hierarchy based on non-volatile memories (NVM) that allows for aggressive fine-grained power gating enabled by emerging low-power NVM technologies and monolithic 3D integration. Experimental results show that, by adopting this memory hierarchy, power consumption can be reduced by 5.42x in the DSP stage

    Straintronics: A Leap towards Ultimate Energy Efficiency of Magnetic Memory and Logic

    Full text link
    After decades of exponential growth of the semiconductor industries, predicted by Moore’s Law, the complementary metal-oxide semiconductor (CMOS) circuits are approaching their end of the road, as the feature sizes reach sub-10nm regimes, leaving electrical engineers with a profusion of design challenges in terms of energy limitations and power density. The latter has left the road for alternative technologies wide open to help CMOS overcome the present challenges. Magnetic random access memories (MRAM) are one of the candidates to assist with aforesaid obstacles. Proposed in the early 90’s, MRAM has been under research and development for decades. The expedition for energy efficient MRAM is carried out by the fact that magnetic logic, potentially, has orders of magnitude lower switching energy compared to a charge-based CMOS logic since, in a nanomagnet, magnetic domains would self-align with each other. Regrettably, conventional methods for switching the state of the cell in an MRAM, field induced magnetization switching (FIMS) and spin transfer torque (STT), use electric current (flow of charges) to switch the state of the magnet, nullifying the energy advantage, stated above. In order to maximize the energy efficiency, the amount of charge required to switch the state of the MTJ should be minimized. To this end, straintronics, as an alternative energy efficient method to FIMS and STT to switch the state of a nanomagnet, is proposed recently. The method states that by combining piezoelectricity and inverse magnetostriction, the magnetization state of the device can flip, within few nano-seconds while reducing the switching energy by orders of magnitude compared to STT and FIMS. This research focuses on analysis, design, modeling, and applications of straintronics-based MTJ. The first goal is to perform an in-depth analysis on the static and dynamic behavior of the device. Next, we are aiming to increase the accuracy of the model by including the effect of temperature and thermal noise on the device’s behavior. The goal of performing such analysis is to create a comprehensive model of the device that predicts both static and dynamic responses of the magnetization to applied stress. The model will be used to interface the device with CMOS controllers and switches in large systems. Next, in an attempt to speed up the simulation of such devices in multi-megabyte memory systems, a liberal model has been developed by analytically approximating a solution to the magnetization dynamics, which should be numerically solved otherwise. The liberal model demonstrates more than two orders of magnitude speed improvement compared to the conventional numerical models. Highlighting the applications of the straintronics devices by combining such devices with peripheral CMOS circuitry is another goal of the research. Design of a proof-of-concept 2 kilo-bit nonvolatile straintronics-based memory was introduced in our recent work. To highlight the potential applications of the straintronics device, beyond data storage, the use of the principle in ultra-fast yet low power true random number generation and neuron/synapse design for artificial neural networks have been investigated. Lastly, in an attempt to investigate the practicality of the straintronics principle, the effect of process variations and interface imperfections on the switching behavior of the magnetization is investigated. The results reveal the destructive aftermath of fabrication imperfections on the switching pattern of the device, leaving careful pulse-shaping, alternative topologies, or combination with STT as the last resorts for successful strain-based magnetization switching.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137010/1/barangi_1.pd

    Power Profile Obfuscation using RRAMs to Counter DPA Attacks

    Get PDF
    Side channel attacks, such as Differential Power Analysis (DPA), denote a special class of attacks in which sensitive key information is unveiled through information extracted from the physical device executing a cryptographic algorithm. This information leakage, known as side channel information, occurs from computations in a non-ideal system composed of electronic devices such as transistors. Power dissipation is one classic side channel source, which relays information of the data being processed. DPA uses statistical analysis to identify data-dependent correlations in sets of power measurements. Countermeasures against DPA focus on hiding or masking techniques at different levels of design abstraction and are typically associated with high power and area cost. Emerging technologies such as Resistive Random Access Memory (RRAM), offer unique opportunities to mitigate DPAs with their inherent memristor device characteristics such as variability in write time, ultra low power (0.1-3 pJ/bit), and high density (4F2). In this research, an RRAM based architecture is proposed to mitigate the DPA attacks by obfuscating the power profile. Specifically, a dual RRAM based memory module masks the power dissipation of the actual transaction by accessing both the data and its complement from the memory in tandem. DPA attack resiliency for a 128-bit AES cryptoprocessor using RRAM and CMOS memory modules is compared against baseline CMOS only technology. In the proposed AES architecture, four single port RRAM memory units store the intermediate state of the encryption. The correlation between the state data and sets of power measurement is masked due to power dissipated from inverse data access on dual RRAM memory. A customized simulation framework is developed to design the attack scenarios using Synopsys and Cadence tool suites, along with a Hamming weight DPA attack module. The attack mounted on a baseline CMOS architecture is successful and the full key is recovered. However, DPA attacks mounted on the dual CMOS and RRAM based AES cryptoprocessor yielded unsuccessful results with no keys recovered, demonstrating the resiliency of the proposed architecture against DPA attacks

    Fabrication, Characterization and Integration of Resistive Random Access Memories

    Get PDF
    The functionalities and performances of today's computing systems are increasingly dependent on the memory block. This phenomenon, also referred as the Von Neumann bottleneck, is the main motivation for the research on memory technologies. Despite CMOS technology has been improved in the last 50 years by continually increasing the device density, today's mainstream memories, such as SRAM, DRAM and Flash, are facing fundamental limitations to continue this trend. These memory technologies, based on charge storage mechanisms, are suffering from the easy loss of the stored state for devices scaled below 10 nm. This results in a degradation of the performance, reliability and noise margin. The main motivation for the development of emerging non volatile memories is the study of a different mechanism to store the digital state in order to overcome this challenge. Among these emerging technologies, one of the strongest candidate is Resistive Random Access Memory (ReRAM), which relies on the formation or rupture of a conductive filament inside a dielectric layer. This thesis focuses on the fabrication, characterization and integration of ReRAM devices. The main subject is the qualitative and quantitative description of the main factors that influence the resistive memory electrical behavior. Such factors can be related either to the memory fabrication or to the test environment. The first category includes variations in the fabrication process steps, in the device geometry or composition. We discuss the effect of each variation, and we use the obtained database to gather insights on the ReRAM working mechanism and the adopted methodology by using statistical methods. The second category describes how differences in the electrical stimuli sent to the device change the memory performances. We show how these factors can influence the memory resistance states, and we propose an empirical model to describe such changes. We also discuss how it is possible to control the resistance states by modulating the number of input pulses applied to the device. In the second part of this work, we present the integration of the fabricated devices in a CMOS technology environment. We discuss a Verilog-A model used to simulate the device characteristics, and we show two solutions to limit the sneak-path currents for ReRAM crossbars: a dedicated read circuit and the development of selector devices. We describe the selector fabrication, as well as the electrical characterization and the combination with our ReRAMs in a 1S1R configuration. Finally, we show two methods to integrate ReRAM devices in the BEoL of CMOS chips

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Applications of Emerging Memory in Modern Computer Systems: Storage and Acceleration

    Get PDF
    In recent year, heterogeneous architecture emerges as a promising technology to conquer the constraints in homogeneous multi-core architecture, such as supply voltage scaling, off-chip communication bandwidth, and application parallelism. Various forms of accelerators, e.g., GPU and ASIC, have been extensively studied for their tradeoffs between computation efficiency and adaptivity. But with the increasing demand of the capacity and the technology scaling, accelerators also face limitations on cost-efficiency due to the use of traditional memory technologies and architecture design. Emerging memory has become a promising memory technology to inspire some new designs by replacing traditional memory technologies in modern computer system. In this dissertation, I will first summarize my research on the application of Spin-transfer torque random access memory (STT-RAM) in GPU memory hierarchy, which offers simple cell structure and non-volatility to enable much smaller cell area than SRAM and almost zero standby power. Then I will introduce my research about memristor implementation as the computation component in the neuromorphic computing accelerator, which has the similarity between the programmable resistance state of memristors and the variable synaptic strengths of biological synapses to simplify the realization of neural network model. At last, a dedicated interconnection network design for multicore neuromorphic computing system will be presented to reduce the prominent average latency and power consumption brought by NoC in a large size neuromorphic computing system
    corecore