7 research outputs found

    Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures

    Get PDF
    Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations

    Computing with Spintronics: Circuits and architectures

    Get PDF
    This thesis makes the following contributions towards the design of computing platforms with spintronic devices. 1) It explores the use of spintronic memories in the design of a domain-specific processor for an emerging class of data-intensive applications, namely recognition, mining and synthesis (RMS). Two different spintronic memory technologies — Domain Wall Memory (DWM) and STT-MRAM — are utilized to realize the different levels in the memory hierarchy of the domain-specific processor, based on their respective access characteristics. Architectural tradeoffs created by the use of spintronic memories are analyzed. The proposed design achieves 1.5X-4X improvements in energy-delay product compared to a CMOS baseline. 2) It describes the first attempt to use DWM in the cache hierarchy of general-purpose processors. DWM promises unparalleled density by packing several bits of data into each bit-cell. TapeCache, the proposed DWM-based cache architecture, utilizes suitable circuit and architectural optimizations to address two key challenges (i) the high energy and latency requirement of write operations and (ii) the need for shift operations to access the data stored in each DWM bit-cell. At the circuit level, DWM bit-cells that are tailored to the distinct design requirements of different levels in the cache hierarchy are proposed. At the architecture level, TapeCache proposes suitable cache organization and management policies to alleviate the performance impact of shift operations required to access data stored in DWM bit-cells. TapeCache achieves more than 7X improvements in both cache area and energy with virtually identical performance compared to an SRAM-based cache hierarchy. 3) It investigates the design of the on-chip memory hierarchy of general-purpose graphics processing units (GPGPUs)—massively parallel processors that are optimized for data-intensive high-throughput workloads—using DWM. STAG, a high density, energy-efficient Spintronic- Tape Architecture for GPGPU cache hierarchies is described. STAG utilizes different DWM bit-cells to realize different memory arrays in the GPGPU cache hierarchy. To address the challenge of high access latencies due to shifts, STAG predicts upcoming cache accesses by leveraging unique characteristics of GPGPU architectures and workloads, and prefetches data that are both likely to be accessed and require large numbers of shift operations. STAG achieves 3.3X energy reduction and 12.1% performance improvement over CMOS SRAM under iso-area conditions. 4) While the potential of spintronic devices for memories is widely recognized, their utility in realizing logic is much less clear. The thesis presents Spintastic, a new paradigm that utilizes Stochastic Computing (SC) to realize spintronic logic. In SC, data is encoded in the form of pseudo-random bitstreams, such that the probability of a \u271\u27 in a bitstream corresponds to the numerical value that it represents. SC can enable compact, low-complexity logic implementations of various arithmetic functions. Spintastic establishes the synergy between stochastic computing and spin-based logic by demonstrating that they mutually alleviate each other\u27s limitations. On the one hand, various building blocks of SC, which incur significant overheads in CMOS implementations, can be efficiently realized by exploiting the physical characteristics of spin devices. On the other hand, the reduced logic complexity and low logic depth of SC circuits alleviates the shortcomings of spintronic logic. Based on this insight, the design of spin-based stochastic arithmetic circuits, bitstream generators, bitstream permuters and stochastic-to-binary converter circuits are presented. Spintastic achieves 7.1X energy reduction over CMOS implementations for a wide range of benchmarks from the image processing, signal processing, and RMS application domains. 5) In order to evaluate the proposed spintronic designs, the thesis describes various device-to-architecture modeling frameworks. Starting with devices models that are calibrated to measurements, the characteristics of spintronic devices are successively abstracted into circuit-level and architectural models, which are incorporated into suitable simulation frameworks. (Abstract shortened by UMI.

    Modélisation compacte et conception de circuit à base de jonction tunnel ferroélectrique et de jonction tunnel magnétique exploitant le transfert de spin assisté par effet Hall de spin

    Get PDF
    Non-volatile memory (NVM) devices have been attracting intensive research interest since they promise to solve the increasing static power issue caused by CMOS technology scaling. This thesis focuses on two fields related to NVM: the one is the ferroelectric tunnel junction (FTJ), which is a recent emerging NVM device. The other is the spin-Hall-assisted spin-transfer torque (STT), which is a recent proposed write approach for the magnetic tunnel junction (MTJ). Our objective is to develop the compact models for these two technologies and to explore their application in the non-volatile circuits through simulation.First, we investigated physical models describing the electrical behaviors of the FTJ such as tunneling resistance, dynamic ferroelectric switching and memristive response. The accuracy of these physical models is validated by a good agreement with experimental results. In order to develop an electrical model available for the circuit simulation, we programmed the aforementioned physical models with Verilog-A language and integrated them together. The developed electrical model can run on Cadence platform (a standard circuit simulation tool) and faithfully reproduce the behaviors of the FTJ.Then, using the developed FTJ model and STMicroelectronics CMOS design kit, we designed and simulated three types of circuits: i) FTJ-based random access memory (FTRAM), ii) two FTJ-based neuromorphic systems, one of which emulates spike-timing dependent plasticity (STDP) learning rule, the other implements supervised learning of logic functions, iii) FTJ-based Boolean logic block, by which NAND and NOR logic are demonstrated. The influences of the FTJ parameters on the performance of these circuits were analyzed based on simulation results.Finally, we focused on the reversal of the perpendicular magnetization driven by spin-Hall-assisted STT in a three-terminal MTJ. In this scheme, two write currents are applied to generate spin-Hall effect (SHE) and STT. Numerical simulation based on Landau-Lifshitz-Gilbert (LLG) equation demonstrates that the incubation delay of the STT can be eliminated by the strong SHE, resulting in ultrafast magnetization switching without the need to strengthen the STT. We applied this novel write approach to the design of the magnetic flip-flop and full-adder. Performance comparison between the spin-Hall-assisted and the conventional STT magnetic circuits were discussed based on simulation results and theoretical models.Les mémoires non-volatiles (MNV) sont l'objet d'un effort de recherche croissant du fait de leur capacité à limiter la consommation statique, qui obère habituellement la réduction des dimensions dans la technologie CMOS. Dans ce contexte, cette thèse aborde plus spécifiquement deux technologies de mémoires non volatiles : d'une part les jonctions tunnel ferroélectriques (JTF), dispositif non volatil émergent, et d'autre part les dispositifs à transfert de spin (TS) assisté par effet Hall de spin (EHS), approche alternative proposée récemment pour écrire les jonctions tunnel magnétiques (JTM). Mon objectif est de développer des modèles compacts pour ces deux technologies et d'explorer, par simulation, leur intégration dans les circuits non-volatiles.J'ai d'abord étudié les modèles physiques qui décrivent les comportements électriques des JTF : la résistance tunnel, la dynamique de la commutation ferroélectrique et leur comportement memristif. La précision de ces modèles physiques est validée par leur bonne adéquation avec les résultats expérimentaux. Afin de proposer un modèle compatible avec les simulateurs électriques standards, nous j'ai développé les modèles physiques mentionnés ci-dessus en langue Verilog-A, puis je les ai intégrés ensemble. Le modèle électrique que j'ai conçu peut être exploité sur la plate-forme Cadence (un outil standard pour la simulation de circuit). Il reproduit fidèlement les comportements de JTF. Ensuite, en utilisant ce modèle de JTF et le design-kit CMOS de STMicroelectronics, j'ai conçu et simulé trois types de circuits: i) une mémoire vive (RAM) basée sur les JTF, ii) deux systèmes neuromorphiques basés sur les JTF, l'un qui émule la règle d'apprentissage de la plasticité synaptique basée sur le décalage temporel des impulsions neuronale (STDP), l'autre mettant en œuvre l'apprentissage supervisé de fonctions logiques, iii) un bloc logique booléen basé sur les JTF, y compris la démonstration des fonctions logiques NAND et NOR. L'influence des paramètres de la JTF sur les performances de ces circuits a été analysée par simulation. Finalement, nous avons modélisé la dynamique de renversement de l'aimantation dans les dispositifs à anisotropie perpendiculaire à transfert de spin assisté par effet Hall de spin dans un JTM à trois terminaux. Dans ce schéma, deux courants d'écriture sont appliqués pour générer l'EHS et le TS. La simulation numérique basée sur l'équation de Landau-Lifshitz-Gilbert (LLG) démontre que le délai d'incubation de TS peut être éliminé par un fort EHS, conduisant à la commutation ultra-rapide de l'aimantation, sans pour autant requérir une augmentation excessive du TS. Nous avons appliqué cette nouvelle méthode d'écriture à la conception d'une bascule magnétique et d'un additionneur 1 bit magnétique. Les performances des circuits magnétiques assistés par l'EHS ont été comparés à ceux écrits par transfert de spin, par simulation et par une analyse fondée sur le modèle théorique

    Harnessing noise to enhance robustness vs. efficiency trade-off in machine learning

    Get PDF
    While deep nets have achieved human-comparable accuracy in various classification tasks, they fall short significantly in terms of the robustness and cost metrics. For example, tiny engineered corruptions in deep net inputs can reduce their accuracy to zero. Furthermore, deep nets also require millions of trainable parameters, resulting in significant training and inference costs. These robustness and cost challenges are well recognized today. In response, there have been a plethora of works focusing on improving either the accuracy vs. robustness trade-off, or the accuracy vs. cost trade-off. However, simultaneous consideration of accuracy, robustness, and cost metrics is largely absent today, in part, because far fewer works have explored the robustness vs. cost trade-off. This dissertation aims to fill this gap by focusing explicitly on the robustness vs. cost trade-off in the presence of data noise, as well as hardware noise. Specifically, we explore how to harness the noise in order to enhance this trade-off. We characterize and improve robustness vs. cost trade-offs across diverse problem settings, ranging from beyond-CMOS hardware implementations of machine learning (ML) classifiers to efficient training of deep nets that are robust to multiple types of corruptions in their inputs. This dissertation can be roughly divided into two part, one focusing on hardware noise and the other on data noise. In the first part, we start by focusing on harnessing noise in spintronic hardware implementations, where the logic gates become error prone when operated at lower switching energy/delay. We propose techniques to shape the resulting hardware noise distribution and to efficiently compensate it at the system-level output. As a result, we observe 1000x improvement intolerance to gate-level switching error rates, while keeping the area/energy overhead of compensation circuits to as low as 15%. These robustness enhancements further enable 3× reduction in iso-throughput energy consumption of a binary ML classifier employed for EEG-based seizure detection. Building on this work, we propose spintronic channel networks, exponential decay of spin current to efficiently realize multi-bit dot product computation. We employ error-prone nanomagnets as efficient stochastic slicers biased by spin currents proportional to the likelihood of the classification decision. We achieve 112x-to-22.5x and 14x-to-2.5x higher energy-efficiency over conventional spin-based and 20 nm CMOS designs, respectively, when realizing 10-to-100-dimensional binary classifiers. Furthermore, we also consider the impact of hardware noise originated from process variations and readout circuits in in-memory computing implementations employing non-volatile resistive crossbar arrays. Based on our analysis, we identify design configurations achieving the highest signal-to-noise ratio (SNR), and further estimate how such robustness trades off with the array energy consumption. In the second part, we switch gears to improve the robustness vs. cost trade-off for deep nets in the presence of data noise. Specifically, we focus on the impact of adversarial perturbations in the deep nets inputs. We propose and validate the hypotheses about orientations of dominant subspaces of adversarial perturbations. We demonstrate how changes in the curvature of decision boundary of the deep nets affects the orientations of the adversarial perturbations. Based on these insights we demonstrate how shaped noise can be introduced as a feature to enhance robustness vs. cost trade-off in deep nets. Specifically, we propose shaped noise augmented processing (SNAP), a method to efficiently train deep nets that are robust to multiple types of adversarial perturbations, simultaneously. SNAP prepends a deep net with a shaped noise augmentation layer whose distribution is learned along with the network parameters using any established robust training framework. Based on extensive comparisons with nine state-of-the-art (SOTA) robust training frameworks, we show that SNAP achieves the best robustness vs. training cost trade-off. In particular, it enables 4x reduction in the training cost compared to the SOTA approach published just this last year. Furthermore, thanks to the computational simplicity of SNAP, it is the first technique of its kind that is scalable to large datasets, such as ImageNet

    ARCHITECTING EMERGING MEMORY TECHNOLOGIES FOR ENERGY-EFFICIENT COMPUTING IN MODERN PROCESSORS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Particle Physics Reference Library

    Get PDF
    This third open access volume of the handbook series deals with accelerator physics, design, technology and operations, as well as with beam optics, dynamics and diagnostics. A joint CERN-Springer initiative, the “Particle Physics Reference Library” provides revised and updated contributions based on previously published material in the well-known Landolt-Boernstein series on particle physics, accelerators and detectors (volumes 21A,B1,B2,C), which took stock of the field approximately one decade ago. Central to this new initiative is publication under full open acces
    corecore