146 research outputs found

    Automated Synthesis of Memristor Crossbar Networks

    Get PDF
    The advancement of semiconductor device technology over the past decades has enabled the design of increasingly complex electrical and computational machines. Electronic design automation (EDA) has played a significant role in the design and implementation of transistor-based machines. However, as transistors move closer toward their physical limits, the speed-up provided by Moore\u27s law will grind to a halt. Once again, we find ourselves on the verge of a paradigm shift in the computational sciences as newer devices pave the way for novel approaches to computing. One of such devices is the memristor -- a resistor with non-volatile memory. Memristors can be used as junctional switches in crossbar circuits, which comprise of intersecting sets of vertical and horizontal nanowires. The major contribution of this dissertation lies in automating the design of such crossbar circuits -- doing a new kind of EDA for a new kind of computational machinery. In general, this dissertation attempts to answer the following questions: a. How can we synthesize crossbars for computing large Boolean formulas, up to 128-bit? b. How can we synthesize more compact crossbars for small Boolean formulas, up to 8-bit? c. For a given loop-free C program doing integer arithmetic, is it possible to synthesize an equivalent crossbar circuit? We have presented novel solutions to each of the above problems. Our new, proposed solutions resolve a number of significant bottlenecks in existing research, via the usage of innovative logic representation and artificial intelligence techniques. For large Boolean formulas (up to 128-bit), we have utilized Reduced Ordered Binary Decision Diagrams (ROBDDs) to automatically synthesize linearly growing crossbar circuits that compute them. This cutting edge approach towards flow-based computing has yielded state-of-the-art results. It is worth noting that this approach is scalable to n-bit Boolean formulas. We have made significant original contributions by leveraging artificial intelligence for automatic synthesis of compact crossbar circuits. This inventive method has been expanded to encompass crossbar networks with 1D1M (1-diode-1-memristor) switches, as well. The resultant circuits satisfy the tight constraints of the Feynman Grand Prize challenge and are able to perform 8-bit binary addition. A leading edge development for end-to-end computation with flow-based crossbars has been implemented, which involves methodical translation of loop-free C programs into crossbar circuits via automated synthesis. The original contributions described in this dissertation reflect the substantial progress we have made in the area of electronic design automation for synthesis of memristor crossbar networks

    Approximate In-memory computing on RERAMs

    Get PDF
    Computing systems have seen tremendous growth over the past few decades in their capabilities, efficiency, and deployment use cases. This growth has been driven by progress in lithography techniques, improvement in synthesis tools, architectures and power management. However, there is a growing disparity between computing power and the demands on modern computing systems. The standard Von-Neuman architecture has separate data storage and data processing locations. Therefore, it suffers from a memory-processor communication bottleneck, which is commonly referred to as the \u27memory wall\u27. The relatively slower progress in memory technology compared with processing units has continued to exacerbate the memory wall problem. As feature sizes in the CMOS logic family reduce further, quantum tunneling effects are becoming more prominent. Simultaneously, chip transistor density is already so high that all transistors cannot be powered up at the same time without violating temperature constraints, a phenomenon characterized as dark-silicon. Coupled with this, there is also an increase in leakage currents with smaller feature sizes, resulting in a breakdown of \u27Dennard\u27s\u27 scaling. All these challenges cannot be met without fundamental changes in current computing paradigms. One viable solution is in-memory computing, where computing and storage are performed alongside each other. A number of emerging memory fabrics such as ReRAMS, STT-RAMs, and PCM RAMs are capable of performing logic in-memory. ReRAMs possess high storage density, have extremely low power consumption and a low cost of fabrication. These advantages are due to the simple nature of its basic constituting elements which allow nano-scale fabrication. We use flow-based computing on ReRAM crossbars for computing that exploits natural sneak paths in those crossbars. Another concurrent development in computing is the maturation of domains that are error resilient while being highly data and power intensive. These include machine learning, pattern recognition, computer vision, image processing, and networking, etc. This shift in the nature of computing workloads has given weight to the idea of approximate computing , in which device efficiency is improved by sacrificing tolerable amounts of accuracy in computation. We present a mathematically rigorous foundation for the synthesis of approximate logic and its mapping to ReRAM crossbars using search based and graphical methods

    Automated Synthesis of Unconventional Computing Systems

    Get PDF
    Despite decades of advancements, modern computing systems which are based on the von Neumann architecture still carry its shortcomings. Moore\u27s law, which had substantially masked the effects of the inherent memory-processor bottleneck of the von Neumann architecture, has slowed down due to transistor dimensions nearing atomic sizes. On the other hand, modern computational requirements, driven by machine learning, pattern recognition, artificial intelligence, data mining, and IoT, are growing at the fastest pace ever. By their inherent nature, these applications are particularly affected by communication-bottlenecks, because processing them requires a large number of simple operations involving data retrieval and storage. The need to address the problems associated with conventional computing systems at the fundamental level has given rise to several unconventional computing paradigms. In this dissertation, we have made advancements for automated syntheses of two types of unconventional computing paradigms: in-memory computing and stochastic computing. In-memory computing circumvents the problem of limited communication bandwidth by unifying processing and storage at the same physical locations. The advent of nanoelectronic devices in the last decade has made in-memory computing an energy-, area-, and cost-effective alternative to conventional computing. We have used Binary Decision Diagrams (BDDs) for in-memory computing on memristor crossbars. Specifically, we have used Free-BDDs, a special class of binary decision diagrams, for synthesizing crossbars for flow-based in-memory computing. Stochastic computing is a re-emerging discipline with several times smaller area/power requirements as compared to conventional computing systems. It is especially suited for fault-tolerant applications like image processing, artificial intelligence, pattern recognition, etc. We have proposed a decision procedures-based iterative algorithm to synthesize Linear Finite State Machines (LFSM) for stochastically computing non-linear functions such as polynomials, exponentials, and hyperbolic functions

    Logic synthesis and testing techniques for switching nano-crossbar arrays

    Get PDF
    Beyond CMOS, new technologies are emerging to extend electronic systems with features unavailable to silicon-based devices. Emerging technologies provide new logic and interconnection structures for computation, storage and communication that may require new design paradigms, and therefore trigger the development of a new generation of design automation tools. In the last decade, several emerging technologies have been proposed and the time has come for studying new ad-hoc techniques and tools for logic synthesis, physical design and testing. The main goal of this project is developing a complete synthesis and optimization methodology for switching nano-crossbar arrays that leads to the design and construction of an emerging nanocomputer. New models for diode, FET, and four-terminal switch based nanoarrays are developed. The proposed methodology implements logic, arithmetic, and memory elements by considering performance parameters such as area, delay, power dissipation, and reliability. With combination of logic, arithmetic, and memory elements a synchronous state machine (SSM), representation of a computer, is realized. The proposed methodology targets variety of emerging technologies including nanowire/nanotube crossbar arrays, magnetic switch-based structures, and crossbar memories. The results of this project will be a foundation of nano-crossbar based circuit design techniques and greatly contribute to the construction of emerging computers beyond CMOS. The topic of this project can be considered under the research area of â\u80\u9cEmerging Computing Modelsâ\u80\u9d or â\u80\u9cComputational Nanoelectronicsâ\u80\u9d, more specifically the design, modeling, and simulation of new nanoscale switches beyond CMOS

    Design of Ternary Operations Utilizing Flow-Based Computing

    Get PDF
    The development of algorithms and circuit designs that exploit devices that have the ability to persist multiple values will lead to alternative technologies to overcome the issues caused by the end of Dennard scaling and slowing of Moore\u27s Law. Flow-based designs have been used to develop binary adders and multipliers. Data stored on non-volatile memristors are used to direct the flow of current through nanowires arranged in a crossbar. The algorithmic design of the flow-based crossbar is fast, compact, and efficient. In this paper, we seek to automate the discovery of flow-based designs of ternary circuits utilizing memristive crossbars

    Verification and Automated Synthesis of Memristor Crossbars

    Get PDF
    The Memristor is a newly synthesized circuit element correlating differences in electrical charge and magnetic flux, which effectively acts as a nonlinear resistor with memory. The small size of this element and its potential for passive state preservation has opened great opportunities for data-level parallel computation, since the functions of memory and processing can be realized on the same physical device. In this research we present an in-depth study of memristor crossbars for combinational and sequential logic. We outline the structure of formulas which they are able to produce and henceforth the inherent powers and limitations of Memristive Crossbar Computing. As an improvement on previous methods of automated crossbar synthesis, a method for symbolically verifying crossbars is proposed, proven and analysed

    Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses

    Get PDF
    International audienceMultiple modern applications of electronics call for inexpensive chips that can perform complex operations on natural data with limited energy. A vision for accomplishing this is implementing hardware neural networks, which fuse computation and memory, with low cost organic electronics. A challenge, however, is the implementation of synapses (analog memories) composed of such materials. In this work, we introduce robust, fastly programmable, nonvolatile organic memristive nanodevices based on electrografted redox complexes that implement synapses thanks to a wide range of accessible intermediate conductivity states. We demonstrate experimentally an elementary neural network, capable of learning functions, which combines four pairs of organic memristors as synapses and conventional electronics as neurons. Our architecture is highly resilient to issues caused by imperfect devices. It tolerates inter-device variability and an adaptable learning rule offers immunity against asymmetries in device switching. Highly compliant with conventional fabrication processes, the system can be extended to larger computing systems capable of complex cognitive tasks, as demonstrated in complementary simulations

    Analog Spiking Neuromorphic Circuits and Systems for Brain- and Nanotechnology-Inspired Cognitive Computing

    Get PDF
    Human society is now facing grand challenges to satisfy the growing demand for computing power, at the same time, sustain energy consumption. By the end of CMOS technology scaling, innovations are required to tackle the challenges in a radically different way. Inspired by the emerging understanding of the computing occurring in a brain and nanotechnology-enabled biological plausible synaptic plasticity, neuromorphic computing architectures are being investigated. Such a neuromorphic chip that combines CMOS analog spiking neurons and nanoscale resistive random-access memory (RRAM) using as electronics synapses can provide massive neural network parallelism, high density and online learning capability, and hence, paves the path towards a promising solution to future energy-efficient real-time computing systems. However, existing silicon neuron approaches are designed to faithfully reproduce biological neuron dynamics, and hence they are incompatible with the RRAM synapses, or require extensive peripheral circuitry to modulate a synapse, and are thus deficient in learning capability. As a result, they eliminate most of the density advantages gained by the adoption of nanoscale devices, and fail to realize a functional computing system. This dissertation describes novel hardware architectures and neuron circuit designs that synergistically assemble the fundamental and significant elements for brain-inspired computing. Versatile CMOS spiking neurons that combine integrate-and-fire, passive dense RRAM synapses drive capability, dynamic biasing for adaptive power consumption, in situ spike-timing dependent plasticity (STDP) and competitive learning in compact integrated circuit modules are presented. Real-world pattern learning and recognition tasks using the proposed architecture were demonstrated with circuit-level simulations. A test chip was implemented and fabricated to verify the proposed CMOS neuron and hardware architecture, and the subsequent chip measurement results successfully proved the idea. The work described in this dissertation realizes a key building block for large-scale integration of spiking neural network hardware, and then, serves as a step-stone for the building of next-generation energy-efficient brain-inspired cognitive computing systems

    Harnessing noise to enhance robustness vs. efficiency trade-off in machine learning

    Get PDF
    While deep nets have achieved human-comparable accuracy in various classification tasks, they fall short significantly in terms of the robustness and cost metrics. For example, tiny engineered corruptions in deep net inputs can reduce their accuracy to zero. Furthermore, deep nets also require millions of trainable parameters, resulting in significant training and inference costs. These robustness and cost challenges are well recognized today. In response, there have been a plethora of works focusing on improving either the accuracy vs. robustness trade-off, or the accuracy vs. cost trade-off. However, simultaneous consideration of accuracy, robustness, and cost metrics is largely absent today, in part, because far fewer works have explored the robustness vs. cost trade-off. This dissertation aims to fill this gap by focusing explicitly on the robustness vs. cost trade-off in the presence of data noise, as well as hardware noise. Specifically, we explore how to harness the noise in order to enhance this trade-off. We characterize and improve robustness vs. cost trade-offs across diverse problem settings, ranging from beyond-CMOS hardware implementations of machine learning (ML) classifiers to efficient training of deep nets that are robust to multiple types of corruptions in their inputs. This dissertation can be roughly divided into two part, one focusing on hardware noise and the other on data noise. In the first part, we start by focusing on harnessing noise in spintronic hardware implementations, where the logic gates become error prone when operated at lower switching energy/delay. We propose techniques to shape the resulting hardware noise distribution and to efficiently compensate it at the system-level output. As a result, we observe 1000x improvement intolerance to gate-level switching error rates, while keeping the area/energy overhead of compensation circuits to as low as 15%. These robustness enhancements further enable 3× reduction in iso-throughput energy consumption of a binary ML classifier employed for EEG-based seizure detection. Building on this work, we propose spintronic channel networks, exponential decay of spin current to efficiently realize multi-bit dot product computation. We employ error-prone nanomagnets as efficient stochastic slicers biased by spin currents proportional to the likelihood of the classification decision. We achieve 112x-to-22.5x and 14x-to-2.5x higher energy-efficiency over conventional spin-based and 20 nm CMOS designs, respectively, when realizing 10-to-100-dimensional binary classifiers. Furthermore, we also consider the impact of hardware noise originated from process variations and readout circuits in in-memory computing implementations employing non-volatile resistive crossbar arrays. Based on our analysis, we identify design configurations achieving the highest signal-to-noise ratio (SNR), and further estimate how such robustness trades off with the array energy consumption. In the second part, we switch gears to improve the robustness vs. cost trade-off for deep nets in the presence of data noise. Specifically, we focus on the impact of adversarial perturbations in the deep nets inputs. We propose and validate the hypotheses about orientations of dominant subspaces of adversarial perturbations. We demonstrate how changes in the curvature of decision boundary of the deep nets affects the orientations of the adversarial perturbations. Based on these insights we demonstrate how shaped noise can be introduced as a feature to enhance robustness vs. cost trade-off in deep nets. Specifically, we propose shaped noise augmented processing (SNAP), a method to efficiently train deep nets that are robust to multiple types of adversarial perturbations, simultaneously. SNAP prepends a deep net with a shaped noise augmentation layer whose distribution is learned along with the network parameters using any established robust training framework. Based on extensive comparisons with nine state-of-the-art (SOTA) robust training frameworks, we show that SNAP achieves the best robustness vs. training cost trade-off. In particular, it enables 4x reduction in the training cost compared to the SOTA approach published just this last year. Furthermore, thanks to the computational simplicity of SNAP, it is the first technique of its kind that is scalable to large datasets, such as ImageNet

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
    • …
    corecore