3,521 research outputs found

    Large-Scale Optical Neural Networks based on Photoelectric Multiplication

    Full text link
    Recent success in deep neural networks has generated strong interest in hardware accelerators to improve speed and energy consumption. This paper presents a new type of photonic accelerator based on coherent detection that is scalable to large (N106N \gtrsim 10^6) networks and can be operated at high (GHz) speeds and very low (sub-aJ) energies per multiply-and-accumulate (MAC), using the massive spatial multiplexing enabled by standard free-space optical components. In contrast to previous approaches, both weights and inputs are optically encoded so that the network can be reprogrammed and trained on the fly. Simulations of the network using models for digit- and image-classification reveal a "standard quantum limit" for optical neural networks, set by photodetector shot noise. This bound, which can be as low as 50 zJ/MAC, suggests performance below the thermodynamic (Landauer) limit for digital irreversible computation is theoretically possible in this device. The proposed accelerator can implement both fully-connected and convolutional networks. We also present a scheme for back-propagation and training that can be performed in the same hardware. This architecture will enable a new class of ultra-low-energy processors for deep learning.Comment: Text: 10 pages, 5 figures, 1 table. Supplementary: 8 pages, 5, figures, 2 table

    Developing large-scale field-programmable analog arrays for rapid prototyping

    Get PDF
    Field-programmable analog arrays (FPAAs) provide a method for rapidly prototyping analog systems. While currently available FPAAs vary in architecture and interconnect design, they are often limited in size and flexibility. For FPAAs to be as useful and marketable as modern digital reconfigurable devices, new technologies must be explored to provide area efficient, accurately programmable analog circuitry that can be easily integrated into a larger digital/mixed signal system. By leveraging recent advances in floating gate transistors, a new generation of FPAAs are achievable that will dramatically advance the current state of the art in terms of size, functionality, and flexibility

    Analog signal processing on a reconfigurable platform

    Get PDF
    The Cooperative Analog/Digital Signal Processing (CADSP) research group's approach to signal processing is to see what opportunities lie in adjusting the line between what is traditionally computed in digital and what can be done in analog. By allowing more computation to be done in analog, we can take advantage of its low power, continuous domain operation, and parallel capabilities. One setback keeping Analog Signal Processing (ASP) from achieving more wide-spread use, however, is its lack of programmability. The design cycle for a typical analog system often involves several iterations of the fabrication step, which is labor intensive, time consuming, and expensive. These costs in both time and money reduce the likelihood that engineers will consider an analog solution. With CADSP's development of a reconfigurable analog platform, a Field-Programmable Analog Array (FPAA), it has become much more practical for systems to incorporate processing in the analog domain. In this Thesis, I present an entire chain of tools that allow one to design simply at the system block level and then compile that design onto analog hardware. This tool chain uses the Simulink design environment and a custom library of blocks to create analog systems. I also present several of these ASP blocks, covering a broad range of functions from matrix computation to interfacing. In addition to these tools and blocks, the most recent FPAA architectures are discussed. These include the latest RASP general-purpose FPAAs as well as an adapted version geared toward high-speed applications.M.S.Committee Chair: Hasler, Paul; Committee Member: Anderson, David; Committee Member: Ghovanloo, Maysa

    MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing

    Get PDF
    Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements

    Large scale reconfigurable analog system design enabled through floating-gate transistors

    Get PDF
    This work is concerned with the implementation and implication of non-volatile charge storage on VLSI system design. To that end, the floating-gate pFET (fg-pFET) is considered in the context of large-scale arrays. The programming of the element in an efficient and predictable way is essential to the implementation of these systems, and is thus explored. The overhead of the control circuitry for the fg-pFET, a key scalability issue, is examined. A light-weight, trend-accurate model is absolutely necessary for VLSI system design and simulation, and is also provided. Finally, several reconfigurable and reprogrammable systems that were built are discussed.Ph.D.Committee Chair: Hasler, Paul E.; Committee Member: Anderson, David V.; Committee Member: Ayazi, Farrokh; Committee Member: Degertekin, F. Levent; Committee Member: Hunt, William D

    Optical neural networks: an introduction to a special issue by the feature editors

    Get PDF
    This feature of Applied Optics is devoted to papers on the optical implementation of neural-network models of computation. Papers are included on optoelectronic neuron array devices, optical interconnection techniques using holograms and spatial light modulators, optical associative memories, demonstrations of optoelectronic systems for learning, classification, and target recognition, and on the demonstration, analysis, and simulation of adaptive interconnections for optical neural networks using photorefractive volume holograms

    Analog Vector-Matrix Multiplier Based on Programmable Current Mirrors for Neural Network Integrated Circuits

    Get PDF
    We propose a CMOS Analog Vector-Matrix Multiplier for Deep Neural Networks, implemented in a standard single-poly 180 nm CMOS technology. The learning weights are stored in analog floating-gate memory cells embedded in current mirrors implementing the multiplication operations. We experimentally verify the analog storage capability of designed single-poly floating-gate cells, the accuracy of the multiplying function of proposed tunable current mirrors, and the effective number of bits of the analog operation. We perform system-level simulations to show that an analog deep neural network based on the proposed vector-matrix multiplier can achieve an inference accuracy comparable to digital solutions with an energy efficiency of 26.4 TOPs/J, a layer latency close to 100 mu s and an intrinsically high degree of parallelism. Our proposed design has also a cost advantage, considering that it can be implemented in a standard single-poly CMOS process flow

    High accuracy computation with linear analog optical systems: a critical study

    Get PDF
    High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics
    corecore