Search CORE

90 research outputs found

Automated Synthesis of Memristor Crossbar Networks

Author: Chakraborty Dwaipayan
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2019
Field of study

The advancement of semiconductor device technology over the past decades has enabled the design of increasingly complex electrical and computational machines. Electronic design automation (EDA) has played a significant role in the design and implementation of transistor-based machines. However, as transistors move closer toward their physical limits, the speed-up provided by Moore\u27s law will grind to a halt. Once again, we find ourselves on the verge of a paradigm shift in the computational sciences as newer devices pave the way for novel approaches to computing. One of such devices is the memristor -- a resistor with non-volatile memory. Memristors can be used as junctional switches in crossbar circuits, which comprise of intersecting sets of vertical and horizontal nanowires. The major contribution of this dissertation lies in automating the design of such crossbar circuits -- doing a new kind of EDA for a new kind of computational machinery. In general, this dissertation attempts to answer the following questions: a. How can we synthesize crossbars for computing large Boolean formulas, up to 128-bit? b. How can we synthesize more compact crossbars for small Boolean formulas, up to 8-bit? c. For a given loop-free C program doing integer arithmetic, is it possible to synthesize an equivalent crossbar circuit? We have presented novel solutions to each of the above problems. Our new, proposed solutions resolve a number of significant bottlenecks in existing research, via the usage of innovative logic representation and artificial intelligence techniques. For large Boolean formulas (up to 128-bit), we have utilized Reduced Ordered Binary Decision Diagrams (ROBDDs) to automatically synthesize linearly growing crossbar circuits that compute them. This cutting edge approach towards flow-based computing has yielded state-of-the-art results. It is worth noting that this approach is scalable to n-bit Boolean formulas. We have made significant original contributions by leveraging artificial intelligence for automatic synthesis of compact crossbar circuits. This inventive method has been expanded to encompass crossbar networks with 1D1M (1-diode-1-memristor) switches, as well. The resultant circuits satisfy the tight constraints of the Feynman Grand Prize challenge and are able to perform 8-bit binary addition. A leading edge development for end-to-end computation with flow-based crossbars has been implemented, which involves methodical translation of loop-free C programs into crossbar circuits via automated synthesis. The original contributions described in this dissertation reflect the substantial progress we have made in the area of electronic design automation for synthesis of memristor crossbar networks

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Approximate In-memory computing on RERAMs

Author: Khokhar Salman Anwar
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2019
Field of study

Computing systems have seen tremendous growth over the past few decades in their capabilities, efficiency, and deployment use cases. This growth has been driven by progress in lithography techniques, improvement in synthesis tools, architectures and power management. However, there is a growing disparity between computing power and the demands on modern computing systems. The standard Von-Neuman architecture has separate data storage and data processing locations. Therefore, it suffers from a memory-processor communication bottleneck, which is commonly referred to as the \u27memory wall\u27. The relatively slower progress in memory technology compared with processing units has continued to exacerbate the memory wall problem. As feature sizes in the CMOS logic family reduce further, quantum tunneling effects are becoming more prominent. Simultaneously, chip transistor density is already so high that all transistors cannot be powered up at the same time without violating temperature constraints, a phenomenon characterized as dark-silicon. Coupled with this, there is also an increase in leakage currents with smaller feature sizes, resulting in a breakdown of \u27Dennard\u27s\u27 scaling. All these challenges cannot be met without fundamental changes in current computing paradigms. One viable solution is in-memory computing, where computing and storage are performed alongside each other. A number of emerging memory fabrics such as ReRAMS, STT-RAMs, and PCM RAMs are capable of performing logic in-memory. ReRAMs possess high storage density, have extremely low power consumption and a low cost of fabrication. These advantages are due to the simple nature of its basic constituting elements which allow nano-scale fabrication. We use flow-based computing on ReRAM crossbars for computing that exploits natural sneak paths in those crossbars. Another concurrent development in computing is the maturation of domains that are error resilient while being highly data and power intensive. These include machine learning, pattern recognition, computer vision, image processing, and networking, etc. This shift in the nature of computing workloads has given weight to the idea of approximate computing , in which device efficiency is improved by sacrificing tolerable amounts of accuracy in computation. We present a mathematically rigorous foundation for the synthesis of approximate logic and its mapping to ReRAM crossbars using search based and graphical methods

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications

Author: Azghadi Mostafa Rahimi
Donati Elisa
Eshraghian Jason K.
Indiveri Giacomo
Lammie Corey
Linares-Barranco Bernabe
Payvand Melika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can facilitate the advancement of the medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies ranging from emerging memristive devices, to established Field Programmable Gate Arrays (FPGAs), and mature Complementary Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. After providing the required background, we unify the sparsely distributed research on neural network and neuromorphic hardware implementations as applied to the healthcare domain. In addition, we benchmark various hardware platforms by performing a biomedical electromyography (EMG) signal processing task and drawing comparisons among them in terms of inference delay and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that different accelerators and neuromorphic processors introduce to healthcare and biomedical domains. This paper can serve a large audience, ranging from nanoelectronics researchers, to biomedical and healthcare practitioners in grasping the fundamental interplay between hardware, algorithms, and clinical adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems as proponents for driving biomedical circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21 pages, 10 figures, 5 tables

arXiv.org e-Print Archive

Repository for Publications and Research Data

ResearchOnline at James Cook University

Doctor of Philosophy

Author: Ardestani Ali Shafiee
Publication venue: University of Utah
Publication date: 01/01/2018
Field of study

dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

The University of Utah: J. Willard Marriott Digital Library

Introductory Chapter: Challenges in Neuro-Memristive Circuit Design

Author: James Alex
Publication venue: 'IntechOpen'
Publication date: 27/05/2020
Field of study

IntechOpen