13 research outputs found

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    Circuit and Architecture Co-Design of STT-RAM for High Performance and Low Energy

    Get PDF
    Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache mem- ory of CPU. Compared with other conventional memory technology, STT-RAM offers many attractive features such as nonvolatility, fast random access speed and extreme low leakage power. However, STT-RAM is still facing many challenges. First of all, programming STT-RAM is a stochastic process due to random thermal fluctuations, so the write errors are hard to avoid. Secondly, the existing STT-RAM cell designs can be used for only single-port accesses, which limits the memory access bandwidth and constraints the system performance. Finally, while other memory technology supports multi-level cell (MLC) design to boost the storage density, adopting MLC to STT-RAM brings many disadvantages such as requirement for large transistor and low access speed. In this work, we proposed solutions on both circuit and architecture level to address these challenges. For the write error issues, we proposed two probabilistic methods, namely write-verify- rewrite with adaptive period (WRAP) and verify-one-while-writing (VOW), for performance improvement and write failure reduction. For dual-port solution, we propose the design methods to support dual-port accesses for STT-RAM. The area increment by introducing an additional port is reduced by leveraging the shared source-line structure. Detailed analysis on the performance/reliability degrada- tion caused by dual-port accesses is performed, and the corresponding design optimization is provided. To unleash the potential of MLC STT-RAM cache, we proposed a new design through a cross-layer co-optimization. The memory cell structure integrated the reversed stacking of magnetic junction tunneling (MTJ) for a more balanced device and design trade-off. In architecture development, we presented an adaptive mode switching mechanism: based on application’s memory access behavior, the MLC STT-RAM cache can dynamically change between low latency SLC mode and high capacity MLC mode. Finally, we present a 4Kb test chip design which can support different types and sizes of MTJs. A configurable sensing solution is used in the test chip so that it can support wide range of MTJ resistance. Such test chip design can help to evaluate various type of MTJs in the future

    Energy-Efficient System Architectures for Intermittently-Powered IoT Devices

    Get PDF
    Various industry forecasts project that, by 2020, there will be around 50 billion devices connected to the Internet of Things (IoT), helping to engineer new solutions to societal-scale problems such as healthcare, energy conservation, transportation, etc. Most of these devices will be wireless due to the expense, inconvenience, or in some cases, the sheer infeasibility of wiring them. With no cord for power and limited space for a battery, powering these devices for operating in a set-and-forget mode (i.e., achieve several months to possibly years of unattended operation) becomes a daunting challenge. Environmental energy harvesting (where the system powers itself using energy that it scavenges from its operating environment) has been shown to be a promising and viable option for powering these IoT devices. However, ambient energy sources (such as vibration, wind, RF signals) are often minuscule, unreliable, and intermittent in nature, which can lead to frequent intervals of power loss. Performing computations reliably in the face of such power supply interruptions is challenging

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures

    Get PDF
    A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSCFPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware. An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low-voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar-structured weighted array. The Programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array\u27s weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n x m1 x m2 x ::: x mi weighted array as a configurable hardware circuit with an n-input layer followed by i ≥ 1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate

    High-Density Solid-State Memory Devices and Technologies

    Get PDF
    This Special Issue aims to examine high-density solid-state memory devices and technologies from various standpoints in an attempt to foster their continuous success in the future. Considering that broadening of the range of applications will likely offer different types of solid-state memories their chance in the spotlight, the Special Issue is not focused on a specific storage solution but rather embraces all the most relevant solid-state memory devices and technologies currently on stage. Even the subjects dealt with in this Special Issue are widespread, ranging from process and design issues/innovations to the experimental and theoretical analysis of the operation and from the performance and reliability of memory devices and arrays to the exploitation of solid-state memories to pursue new computing paradigms

    Modélisation compacte et conception de circuit à base de jonction tunnel ferroélectrique et de jonction tunnel magnétique exploitant le transfert de spin assisté par effet Hall de spin

    Get PDF
    Non-volatile memory (NVM) devices have been attracting intensive research interest since they promise to solve the increasing static power issue caused by CMOS technology scaling. This thesis focuses on two fields related to NVM: the one is the ferroelectric tunnel junction (FTJ), which is a recent emerging NVM device. The other is the spin-Hall-assisted spin-transfer torque (STT), which is a recent proposed write approach for the magnetic tunnel junction (MTJ). Our objective is to develop the compact models for these two technologies and to explore their application in the non-volatile circuits through simulation.First, we investigated physical models describing the electrical behaviors of the FTJ such as tunneling resistance, dynamic ferroelectric switching and memristive response. The accuracy of these physical models is validated by a good agreement with experimental results. In order to develop an electrical model available for the circuit simulation, we programmed the aforementioned physical models with Verilog-A language and integrated them together. The developed electrical model can run on Cadence platform (a standard circuit simulation tool) and faithfully reproduce the behaviors of the FTJ.Then, using the developed FTJ model and STMicroelectronics CMOS design kit, we designed and simulated three types of circuits: i) FTJ-based random access memory (FTRAM), ii) two FTJ-based neuromorphic systems, one of which emulates spike-timing dependent plasticity (STDP) learning rule, the other implements supervised learning of logic functions, iii) FTJ-based Boolean logic block, by which NAND and NOR logic are demonstrated. The influences of the FTJ parameters on the performance of these circuits were analyzed based on simulation results.Finally, we focused on the reversal of the perpendicular magnetization driven by spin-Hall-assisted STT in a three-terminal MTJ. In this scheme, two write currents are applied to generate spin-Hall effect (SHE) and STT. Numerical simulation based on Landau-Lifshitz-Gilbert (LLG) equation demonstrates that the incubation delay of the STT can be eliminated by the strong SHE, resulting in ultrafast magnetization switching without the need to strengthen the STT. We applied this novel write approach to the design of the magnetic flip-flop and full-adder. Performance comparison between the spin-Hall-assisted and the conventional STT magnetic circuits were discussed based on simulation results and theoretical models.Les mémoires non-volatiles (MNV) sont l'objet d'un effort de recherche croissant du fait de leur capacité à limiter la consommation statique, qui obère habituellement la réduction des dimensions dans la technologie CMOS. Dans ce contexte, cette thèse aborde plus spécifiquement deux technologies de mémoires non volatiles : d'une part les jonctions tunnel ferroélectriques (JTF), dispositif non volatil émergent, et d'autre part les dispositifs à transfert de spin (TS) assisté par effet Hall de spin (EHS), approche alternative proposée récemment pour écrire les jonctions tunnel magnétiques (JTM). Mon objectif est de développer des modèles compacts pour ces deux technologies et d'explorer, par simulation, leur intégration dans les circuits non-volatiles.J'ai d'abord étudié les modèles physiques qui décrivent les comportements électriques des JTF : la résistance tunnel, la dynamique de la commutation ferroélectrique et leur comportement memristif. La précision de ces modèles physiques est validée par leur bonne adéquation avec les résultats expérimentaux. Afin de proposer un modèle compatible avec les simulateurs électriques standards, nous j'ai développé les modèles physiques mentionnés ci-dessus en langue Verilog-A, puis je les ai intégrés ensemble. Le modèle électrique que j'ai conçu peut être exploité sur la plate-forme Cadence (un outil standard pour la simulation de circuit). Il reproduit fidèlement les comportements de JTF. Ensuite, en utilisant ce modèle de JTF et le design-kit CMOS de STMicroelectronics, j'ai conçu et simulé trois types de circuits: i) une mémoire vive (RAM) basée sur les JTF, ii) deux systèmes neuromorphiques basés sur les JTF, l'un qui émule la règle d'apprentissage de la plasticité synaptique basée sur le décalage temporel des impulsions neuronale (STDP), l'autre mettant en œuvre l'apprentissage supervisé de fonctions logiques, iii) un bloc logique booléen basé sur les JTF, y compris la démonstration des fonctions logiques NAND et NOR. L'influence des paramètres de la JTF sur les performances de ces circuits a été analysée par simulation. Finalement, nous avons modélisé la dynamique de renversement de l'aimantation dans les dispositifs à anisotropie perpendiculaire à transfert de spin assisté par effet Hall de spin dans un JTM à trois terminaux. Dans ce schéma, deux courants d'écriture sont appliqués pour générer l'EHS et le TS. La simulation numérique basée sur l'équation de Landau-Lifshitz-Gilbert (LLG) démontre que le délai d'incubation de TS peut être éliminé par un fort EHS, conduisant à la commutation ultra-rapide de l'aimantation, sans pour autant requérir une augmentation excessive du TS. Nous avons appliqué cette nouvelle méthode d'écriture à la conception d'une bascule magnétique et d'un additionneur 1 bit magnétique. Les performances des circuits magnétiques assistés par l'EHS ont été comparés à ceux écrits par transfert de spin, par simulation et par une analyse fondée sur le modèle théorique

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Dependable design for low-cost ultra-low-power processors

    Get PDF
    Emerging applications in the Internet of Things (IoT) domain, such as wearables, implantables, smart tags, and wireless sensor networks put severe power, cost, reliability, and security constraints on hardware system design. This dissertation focuses on the architecture and design of dependable ultra-low power computing systems. Specifically, it proposes architecture and design techniques that exploit the unique application and usage characteristics of future computing systems to deliver low power, while meeting the reliability and security constraints of these systems. First, this dissertation considers the challenge of achieving both low power and high reliability in SRAM memories. It proposes both an architectural technique to reduce the overheads of error correction and a technique that uses the nature of error correcting codes to allow lower voltage operation without sacrificing reliability. Next, this dissertation considers low power and low cost. By leveraging the fact that many IoT systems are embedded in nature and will run the same application for their entire lifetime, fine-grained usage characteristics of the hardware-software system can be determined at design time. This dissertation presents a novel hardware-software co-analysis based on symbolic simulation that can determine the possible states of the processor throughout any execution of a specific application. This enables power-gating where more gates are turned off for longer, bespoke processors customized to specific applications, and stricter determination of peak power bounds. Finally, this dissertation considers achieving secure IoT systems at low cost and power overhead. By leveraging the hardware-software co-analysis, this dissertation shows that gate-level information flow security guarantees can be provided without hardware overheads

    Laboratory Directed Research and Development FY 1998 Progress Report

    Full text link
    corecore