32 research outputs found

    Towards Energy-Efficient and Reliable Computing: From Highly-Scaled CMOS Devices to Resistive Memories

    Get PDF
    The continuous increase in transistor density based on Moore\u27s Law has led us to highly scaled Complementary Metal-Oxide Semiconductor (CMOS) technologies. These transistor-based process technologies offer improved density as well as a reduction in nominal supply voltage. An analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies is proposed on an IEEE 754 Single Precision Floating-Point Unit implementation. Based on the results, using the 15nm technology offers 4-times less energy and 3-fold smaller footprint. New challenges also arise, such as relative proportion of leakage power in standby mode that can be addressed by post-CMOS technologies. Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, Sense Margin (SM), and energy or power consumption costs versus resiliency benefits. In an attempt to further improve the Process Variation (PV) immunity of the Sense Amplifiers (SAs), a new SA has been introduced called Adaptive Sense Amplifier (ASA). ASA can benefit from low Bit Error Rate (BER) and low Energy Delay Product (EDP) by combining the properties of two of the commonly used SAs, Pre-Charge Sense Amplifier (PCSA) and Separated Pre-Charge Sense Amplifier (SPCSA). ASA can operate in either PCSA or SPCSA mode based on the requirements of the circuit such as energy efficiency or reliability. Then, ASA is utilized to propose a novel approach to actually leverage the PV in Non-Volatile Memory (NVM) arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    Spin-Transfer-Torque (STT) Devices for On-chip Memory and Their Applications to Low-standby Power Systems

    Get PDF
    With the scaling of CMOS technology, the proportion of the leakage power to total power consumption increases. Leakage may account for almost half of total power consumption in high performance processors. In order to reduce the leakage power, there is an increasing interest in using nonvolatile storage devices for memory applications. Among various promising nonvolatile memory elements, spin-transfer torque magnetic RAM (STT-MRAM) is identified as one of the most attractive alternatives to conventional SRAM. However, several design challenges of STT-MRAM such as shared read and write current paths, single-ended sensing, and high dynamic power are major challenges to be overcome to make it suitable for on-chip memories. To mitigate such problems, we propose a domain wall coupling based spin-transfer torque (DWCSTT) device for on-chip caches. Our proposed DWCSTT bit-cell decouples the read and the write current paths by the electrically-insulating magnetic coupling layer so that we can separately optimize read operation without having an impact on write-ability. In addition, the complementary polarizer structure in the read path of the DWCSTT device allows DWCSTT to enable self-referenced differential sensing. DWCSTT bit-cells improve the write power consumption due to the low electrical resistance of the write current path. Furthermore, we also present three different bit-cell level design techniques of Spin-Orbit Torque MRAM (SOT-MRAM) for alleviating some of the inefficiencies of conventional magnetic memories while maintaining the advantages of spin-orbit torque (SOT) based novel switching mechanism such as low write current requirement and decoupled read and write current path. Our proposed SOT-MRAM with supporting dual read/write ports (1R/1W) can address the issue of high-write latency of STT-MRAM by simultaneous 1R/1W accesses. Second, we propose a new type of SOT-MRAM which uses only one access transistor along with a Schottky diode in order to mitigate the area-overhead caused by two access transistors in conventional SOT-MRAM. Finally, a new design technique of SOT-MRAM is presented to improve the integration density by utilizing a shared bit-line structure

    Circuit and Architecture Co-Design of STT-RAM for High Performance and Low Energy

    Get PDF
    Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache mem- ory of CPU. Compared with other conventional memory technology, STT-RAM offers many attractive features such as nonvolatility, fast random access speed and extreme low leakage power. However, STT-RAM is still facing many challenges. First of all, programming STT-RAM is a stochastic process due to random thermal fluctuations, so the write errors are hard to avoid. Secondly, the existing STT-RAM cell designs can be used for only single-port accesses, which limits the memory access bandwidth and constraints the system performance. Finally, while other memory technology supports multi-level cell (MLC) design to boost the storage density, adopting MLC to STT-RAM brings many disadvantages such as requirement for large transistor and low access speed. In this work, we proposed solutions on both circuit and architecture level to address these challenges. For the write error issues, we proposed two probabilistic methods, namely write-verify- rewrite with adaptive period (WRAP) and verify-one-while-writing (VOW), for performance improvement and write failure reduction. For dual-port solution, we propose the design methods to support dual-port accesses for STT-RAM. The area increment by introducing an additional port is reduced by leveraging the shared source-line structure. Detailed analysis on the performance/reliability degrada- tion caused by dual-port accesses is performed, and the corresponding design optimization is provided. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling (MTJ) for a more balanced device and design trade-off. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency SLC mode and high capacity MLC mode. Finally, we present a 4Kb test chip design which can support different types and sizes of MTJs. A configurable sensing solution is used in the test chip so that it can support wide range of MTJ resistance. Such test chip design can help to evaluate various type of MTJs in the future

    STT-MRAM characterization and its test implications

    Get PDF
    Spin torque transfer (STT)-magnetoresistive random-access memory (MRAM) has come a long way in research to meet the speed and power consumption requirements for future memory applications. The state-of-the-art STT-MRAM bit-cells employ magnetic tunnel junction (MTJ) with perpendicular magnetic anisotropy (PMA). The process repeatabil- ity and yield stability for wafer fabrication are some of the critical issues encountered in STT-MRAM mass production. Some of the yield improvement techniques to combat the e ect of process variations have been previously explored. However, little research has been done on defect oriented testing of STT-MRAM arrays. In this thesis, the author investi- gates the parameter deviation and non-idealities encountered during the development of a novel MTJ stack con guration. The characterization result provides motivation for the development of the design for testability (DFT) scheme that can help test and characterize STT-MRAM bit-cells and the CMOS peripheral circuitry e ciently. The primary factors for wafer yield degradation are the device parameter variation and its non-uniformity across the wafer due to the fabrication process non-idealities. There- fore, e ective in-process testing strategies for exploring and verifying the impact of the parameter variation on the wafer yield will be needed to achieve fabrication process opti- mization. While yield depends on the CMOS process variability, quality of the deposited MTJ lm, and other process non-idealities, test platform can enable parametric optimiza- tion and veri cation using the CMOS-based DFT circuits. In this work, we develop a DFT algorithm and implement a DFT circuit for parametric testing and prequali cation of the critical circuits in the CMOS wafer. The DFT circuit successfully replicates the electrical characteristics of MTJ devices and captures their spatial variation across the wafer with an error of less than 4%. We estimate the yield of the read sensing path by implement- ing the DFT circuit, which can replicate the resistance-area product variation up to 50% from its nominal value. The yield data from the read sensing path at di erent wafer loca- tions are analyzed, and a usable wafer radius has been estimated. Our DFT scheme can provide quantitative feedback based on in-die measurement, enabling fabrication process optimization through iterative estimation and veri cation of the calibrated parameters. Another concern that prevents mass production of STT-MRAM arrays is the defect formation in MTJ devices due to aging. Identifying manufacturing defects in the magnetic tunnel junction (MTJ) device is crucial for the yield and reliability of spin-torque-transfer (STT) magnetic random-access memory (MRAM) arrays. Several of the MTJ defects result in parametric deviations of the device that deteriorate over time. We extend our work on the DFT scheme by monitoring the electrical parameter deviations occurring due to the defect formation over time. A programmable DFT scheme was implemented for a sub-arrayin 65 nm CMOS technology to evaluate the feasibility of the test scheme. The scheme utilizes the read sense path to compare the bit-cell electrical parameters against known DFT cells characteristics. Built-in-self-test (BIST) methodology is utilized to trigger the onset of the fault once the device parameter crosses a threshold value. We demonstrate the operation and evaluate the accuracy of detection with the proposed scheme. The DFT scheme can be exploited for monitoring aging defects, modeling their behavior and optimization of the fabrication process. DFT scheme could potentially nd numerous applications for parametric characteriza- tion and fault monitoring of STT-MRAM bit-cell arrays during mass production. Some of the applications include a) Fabrication process feedback to improve wafer turnaround time, b) STT-MRAM bit-cell health monitoring, c) Decoupled characterization of the CMOS pe- ripheral circuitry such as read-sensing path and sense ampli er characterization within the STT-MRAM array. Additionally, the DFT scheme has potential applications for detec- tion of fault formation that could be utilized for deploying redundancy schemes, providing a graceful degradation in MTJ-based bit-cell array due to aging of the device, and also providing feedback to improve the fabrication process and yield learning

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    High-Density Solid-State Memory Devices and Technologies

    Get PDF
    This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms

    Modélisation compacte et conception de circuit à base de jonction tunnel ferroélectrique et de jonction tunnel magnétique exploitant le transfert de spin assisté par effet Hall de spin

    Get PDF
    Non-volatile memory (NVM) devices have been attracting intensive research interest since they promise to solve the increasing static power issue caused by CMOS technology scaling. This thesis focuses on two fields related to NVM: the one is the ferroelectric tunnel junction (FTJ), which is a recent emerging NVM device. The other is the spin-Hall-assisted spin-transfer torque (STT), which is a recent proposed write approach for the magnetic tunnel junction (MTJ). Our objective is to develop the compact models for these two technologies and to explore their application in the non-volatile circuits through simulation.First, we investigated physical models describing the electrical behaviors of the FTJ such as tunneling resistance, dynamic ferroelectric switching and memristive response. The accuracy of these physical models is validated by a good agreement with experimental results. In order to develop an electrical model available for the circuit simulation, we programmed the aforementioned physical models with Verilog-A language and integrated them together. The developed electrical model can run on Cadence platform (a standard circuit simulation tool) and faithfully reproduce the behaviors of the FTJ.Then, using the developed FTJ model and STMicroelectronics CMOS design kit, we designed and simulated three types of circuits: i) FTJ-based random access memory (FTRAM), ii) two FTJ-based neuromorphic systems, one of which emulates spike-timing dependent plasticity (STDP) learning rule, the other implements supervised learning of logic functions, iii) FTJ-based Boolean logic block, by which NAND and NOR logic are demonstrated. The influences of the FTJ parameters on the performance of these circuits were analyzed based on simulation results.Finally, we focused on the reversal of the perpendicular magnetization driven by spin-Hall-assisted STT in a three-terminal MTJ. In this scheme, two write currents are applied to generate spin-Hall effect (SHE) and STT. Numerical simulation based on Landau-Lifshitz-Gilbert (LLG) equation demonstrates that the incubation delay of the STT can be eliminated by the strong SHE, resulting in ultrafast magnetization switching without the need to strengthen the STT. We applied this novel write approach to the design of the magnetic flip-flop and full-adder. Performance comparison between the spin-Hall-assisted and the conventional STT magnetic circuits were discussed based on simulation results and theoretical models.Les mémoires non-volatiles (MNV) sont l'objet d'un effort de recherche croissant du fait de leur capacité à limiter la consommation statique, qui obère habituellement la réduction des dimensions dans la technologie CMOS. Dans ce contexte, cette thèse aborde plus spécifiquement deux technologies de mémoires non volatiles : d'une part les jonctions tunnel ferroélectriques (JTF), dispositif non volatil émergent, et d'autre part les dispositifs à transfert de spin (TS) assisté par effet Hall de spin (EHS), approche alternative proposée récemment pour écrire les jonctions tunnel magnétiques (JTM). Mon objectif est de développer des modèles compacts pour ces deux technologies et d'explorer, par simulation, leur intégration dans les circuits non-volatiles.J'ai d'abord étudié les modèles physiques qui décrivent les comportements électriques des JTF : la résistance tunnel, la dynamique de la commutation ferroélectrique et leur comportement memristif. La précision de ces modèles physiques est validée par leur bonne adéquation avec les résultats expérimentaux. Afin de proposer un modèle compatible avec les simulateurs électriques standards, nous j'ai développé les modèles physiques mentionnés ci-dessus en langue Verilog-A, puis je les ai intégrés ensemble. Le modèle électrique que j'ai conçu peut être exploité sur la plate-forme Cadence (un outil standard pour la simulation de circuit). Il reproduit fidèlement les comportements de JTF. Ensuite, en utilisant ce modèle de JTF et le design-kit CMOS de STMicroelectronics, j'ai conçu et simulé trois types de circuits: i) une mémoire vive (RAM) basée sur les JTF, ii) deux systèmes neuromorphiques basés sur les JTF, l'un qui émule la règle d'apprentissage de la plasticité synaptique basée sur le décalage temporel des impulsions neuronale (STDP), l'autre mettant en œuvre l'apprentissage supervisé de fonctions logiques, iii) un bloc logique booléen basé sur les JTF, y compris la démonstration des fonctions logiques NAND et NOR. L'influence des paramètres de la JTF sur les performances de ces circuits a été analysée par simulation. Finalement, nous avons modélisé la dynamique de renversement de l'aimantation dans les dispositifs à anisotropie perpendiculaire à transfert de spin assisté par effet Hall de spin dans un JTM à trois terminaux. Dans ce schéma, deux courants d'écriture sont appliqués pour générer l'EHS et le TS. La simulation numérique basée sur l'équation de Landau-Lifshitz-Gilbert (LLG) démontre que le délai d'incubation de TS peut être éliminé par un fort EHS, conduisant à la commutation ultra-rapide de l'aimantation, sans pour autant requérir une augmentation excessive du TS. Nous avons appliqué cette nouvelle méthode d'écriture à la conception d'une bascule magnétique et d'un additionneur 1 bit magnétique. Les performances des circuits magnétiques assistés par l'EHS ont été comparés à ceux écrits par transfert de spin, par simulation et par une analyse fondée sur le modèle théorique
    corecore