20 research outputs found

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    Modélisation compacte et conception de circuit à base de jonction tunnel ferroélectrique et de jonction tunnel magnétique exploitant le transfert de spin assisté par effet Hall de spin

    Get PDF
    Non-volatile memory (NVM) devices have been attracting intensive research interest since they promise to solve the increasing static power issue caused by CMOS technology scaling. This thesis focuses on two fields related to NVM: the one is the ferroelectric tunnel junction (FTJ), which is a recent emerging NVM device. The other is the spin-Hall-assisted spin-transfer torque (STT), which is a recent proposed write approach for the magnetic tunnel junction (MTJ). Our objective is to develop the compact models for these two technologies and to explore their application in the non-volatile circuits through simulation.First, we investigated physical models describing the electrical behaviors of the FTJ such as tunneling resistance, dynamic ferroelectric switching and memristive response. The accuracy of these physical models is validated by a good agreement with experimental results. In order to develop an electrical model available for the circuit simulation, we programmed the aforementioned physical models with Verilog-A language and integrated them together. The developed electrical model can run on Cadence platform (a standard circuit simulation tool) and faithfully reproduce the behaviors of the FTJ.Then, using the developed FTJ model and STMicroelectronics CMOS design kit, we designed and simulated three types of circuits: i) FTJ-based random access memory (FTRAM), ii) two FTJ-based neuromorphic systems, one of which emulates spike-timing dependent plasticity (STDP) learning rule, the other implements supervised learning of logic functions, iii) FTJ-based Boolean logic block, by which NAND and NOR logic are demonstrated. The influences of the FTJ parameters on the performance of these circuits were analyzed based on simulation results.Finally, we focused on the reversal of the perpendicular magnetization driven by spin-Hall-assisted STT in a three-terminal MTJ. In this scheme, two write currents are applied to generate spin-Hall effect (SHE) and STT. Numerical simulation based on Landau-Lifshitz-Gilbert (LLG) equation demonstrates that the incubation delay of the STT can be eliminated by the strong SHE, resulting in ultrafast magnetization switching without the need to strengthen the STT. We applied this novel write approach to the design of the magnetic flip-flop and full-adder. Performance comparison between the spin-Hall-assisted and the conventional STT magnetic circuits were discussed based on simulation results and theoretical models.Les mémoires non-volatiles (MNV) sont l'objet d'un effort de recherche croissant du fait de leur capacité à limiter la consommation statique, qui obère habituellement la réduction des dimensions dans la technologie CMOS. Dans ce contexte, cette thèse aborde plus spécifiquement deux technologies de mémoires non volatiles : d'une part les jonctions tunnel ferroélectriques (JTF), dispositif non volatil émergent, et d'autre part les dispositifs à transfert de spin (TS) assisté par effet Hall de spin (EHS), approche alternative proposée récemment pour écrire les jonctions tunnel magnétiques (JTM). Mon objectif est de développer des modèles compacts pour ces deux technologies et d'explorer, par simulation, leur intégration dans les circuits non-volatiles.J'ai d'abord étudié les modèles physiques qui décrivent les comportements électriques des JTF : la résistance tunnel, la dynamique de la commutation ferroélectrique et leur comportement memristif. La précision de ces modèles physiques est validée par leur bonne adéquation avec les résultats expérimentaux. Afin de proposer un modèle compatible avec les simulateurs électriques standards, nous j'ai développé les modèles physiques mentionnés ci-dessus en langue Verilog-A, puis je les ai intégrés ensemble. Le modèle électrique que j'ai conçu peut être exploité sur la plate-forme Cadence (un outil standard pour la simulation de circuit). Il reproduit fidèlement les comportements de JTF. Ensuite, en utilisant ce modèle de JTF et le design-kit CMOS de STMicroelectronics, j'ai conçu et simulé trois types de circuits: i) une mémoire vive (RAM) basée sur les JTF, ii) deux systèmes neuromorphiques basés sur les JTF, l'un qui émule la règle d'apprentissage de la plasticité synaptique basée sur le décalage temporel des impulsions neuronale (STDP), l'autre mettant en œuvre l'apprentissage supervisé de fonctions logiques, iii) un bloc logique booléen basé sur les JTF, y compris la démonstration des fonctions logiques NAND et NOR. L'influence des paramètres de la JTF sur les performances de ces circuits a été analysée par simulation. Finalement, nous avons modélisé la dynamique de renversement de l'aimantation dans les dispositifs à anisotropie perpendiculaire à transfert de spin assisté par effet Hall de spin dans un JTM à trois terminaux. Dans ce schéma, deux courants d'écriture sont appliqués pour générer l'EHS et le TS. La simulation numérique basée sur l'équation de Landau-Lifshitz-Gilbert (LLG) démontre que le délai d'incubation de TS peut être éliminé par un fort EHS, conduisant à la commutation ultra-rapide de l'aimantation, sans pour autant requérir une augmentation excessive du TS. Nous avons appliqué cette nouvelle méthode d'écriture à la conception d'une bascule magnétique et d'un additionneur 1 bit magnétique. Les performances des circuits magnétiques assistés par l'EHS ont été comparés à ceux écrits par transfert de spin, par simulation et par une analyse fondée sur le modèle théorique

    Towards Computational Efficiency of Next Generation Multimedia Systems

    Get PDF
    To address throughput demands of complex applications (like Multimedia), a next-generation system designer needs to co-design and co-optimize the hardware and software layers. Hardware/software knobs must be tuned in synergy to increase the throughput efficiency. This thesis provides such algorithmic and architectural solutions, while considering the new technology challenges (power-cap and memory aging). The goal is to maximize the throughput efficiency, under timing- and hardware-constraints

    Low Power Memory/Memristor Devices and Systems

    Get PDF
    This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within

    Circuit design for embedded memory in low-power integrated circuits

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-152).This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.by Masood Qazi.Ph.D

    Spin-polarised currents and magnetic domain walls

    Get PDF
    Electrical currents flowing in ferromagnetic materials are spin-polarised as a result of the spin-dependent band structure. When the spatial direction of the polarisation changes, in a domain structure, the electrons must somehow accommodate the necessary change in direction of their spin angular momentum as they pass through the wall. Reflection, scattering, or a transfer of angular momentum onto the lattice are all possible outcomes, depending on the circumstances. This gives rise to a variety of different physical effects, most importantly a contribution to the electrical resistance caused by the wall, and a motion of the wall driven by the spin-polarised current

    DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

    Full text link
    Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO
    corecore