404 research outputs found

    Deep in-memory computing

    Get PDF
    There is much interest in embedding data analytics into sensor-rich platforms such as wearables, biomedical devices, autonomous vehicles, robots, and Internet-of-Things to provide these with decision-making capabilities. Such platforms often need to implement machine learning (ML) algorithms under stringent energy constraints with battery-powered electronics. Especially, energy consumption in memory subsystems dominates such a system's energy efficiency. In addition, the memory access latency is a major bottleneck for overall system throughput. To address these issues in memory-intensive inference applications, this dissertation proposes deep in-memory accelerator (DIMA), which deeply embeds computation into the memory array, employing two key principles: (1) accessing and processing multiple rows of memory array at a time, and (2) embedding pitch-matched low-swing analog processing at the periphery of bitcell array. The signal-to-noise ratio (SNR) is budgeted by employing low-swing operations in both memory read and processing to exploit the application level's error immunity for aggressive energy efficiency. This dissertation first describes the system rationale underlying the DIMA's processing stages by identifying the common functional flow across a diverse set of inference algorithms. Based on the analysis, this dissertation presents a multi-functional DIMA to support four algorithms: support vector machine (SVM), template matching (TM), k-nearest neighbor (k-NN), and matched filter. The circuit and architectural level design techniques and guidelines are provided to address the challenges in achieving multi-functionality. A prototype integrated circuit (IC) of a multi-functional DIMA was fabricated with a 16 KB SRAM array in a 65 nm CMOS process. Measurement results show up to 5.6X and 5.8X energy and delay reductions leading to 31X energy delay product (EDP) reduction with negligible (<1%) accuracy degradation as compared to the conventional 8-b fixed-point digital implementation optimally designed for each algorithm. Then, DIMA also has been applied to more complex algorithms: (1) convolutional neural network (CNN), (2) sparse distributed memory (SDM), and (3) random forest (RF). System-level simulations of CNN using circuit behavioral models in a 45 nm SOI CMOS demonstrate that high probability (>0.99) of handwritten digit recognition can be achieved using the MNIST database, along with a 24.5X reduced EDP, a 5.0X reduced energy, and a 4.9X higher throughput as compared to the conventional system. The DIMA-based SDM architecture also achieves up to 25X and 12X delay and energy reductions, respectively, over conventional SDM with negligible accuracy degradation (within 0.4%) for 16X16 binary-pixel image classification. A DIMA-based RF was realized as a prototype IC with a 16 KB SRAM array in a 65 nm process. To the best of our knowledge, this is the first IC realization of an RF algorithm. The measurement results show that the prototype achieves a 6.8X lower EDP compared to a conventional design at the same accuracy (94%) for an eight-class traffic sign recognition problem. The multi-functional DIMA and extension to other algorithms naturally motivated us to consider a programmable DIMA instruction set architecture (ISA), namely MATI. This dissertation explores a synergistic combination of the instruction set, architecture and circuit design to achieve the programmability without losing DIMA's energy and throughput benefits. Employing silicon-validated energy, delay and behavioral models of deep in-memory components, we demonstrate that MATI is able to realize nine ML benchmarks while incurring negligible overhead in energy (< 0.1%), and area (4.5%), and in throughput, over a fixed four-function DIMA. In this process, MATI is able to simultaneously achieve enhancements in both energy (2.5X to 5.5X) and throughput (1.4X to 3.4X) for an overall EDP improvement of up to 12.6X over fixed-function digital architectures

    Analog Photonics Computing for Information Processing, Inference and Optimisation

    Full text link
    This review presents an overview of the current state-of-the-art in photonics computing, which leverages photons, photons coupled with matter, and optics-related technologies for effective and efficient computational purposes. It covers the history and development of photonics computing and modern analogue computing platforms and architectures, focusing on optimization tasks and neural network implementations. The authors examine special-purpose optimizers, mathematical descriptions of photonics optimizers, and their various interconnections. Disparate applications are discussed, including direct encoding, logistics, finance, phase retrieval, machine learning, neural networks, probabilistic graphical models, and image processing, among many others. The main directions of technological advancement and associated challenges in photonics computing are explored, along with an assessment of its efficiency. Finally, the paper discusses prospects and the field of optical quantum computing, providing insights into the potential applications of this technology.Comment: Invited submission by Journal of Advanced Quantum Technologies; accepted version 5/06/202

    Hardware Architectures and Implementations for Associative Memories : the Building Blocks of Hierarchically Distributed Memories

    Get PDF
    During the past several decades, the semiconductor industry has grown into a global industry with revenues around $300 billion. Intel no longer relies on only transistor scaling for higher CPU performance, but instead, focuses more on multiple cores on a single die. It has been projected that in 2016 most CMOS circuits will be manufactured with 22 nm process. The CMOS circuits will have a large number of defects. Especially when the transistor goes below sub-micron, the original deterministic circuits will start having probabilistic characteristics. Hence, it would be challenging to map traditional computational models onto probabilistic circuits, suggesting a need for fault-tolerant computational algorithms. Biologically inspired algorithms, or associative memories (AMs)—the building blocks of cortical hierarchically distributed memories (HDMs) discussed in this dissertation, exhibit a remarkable match to the nano-scale electronics, besides having great fault-tolerance ability. Research on the potential mapping of the HDM onto CMOL (hybrid CMOS/nanoelectronic circuits) nanogrids provides useful insight into the development of non-von Neumann neuromorphic architectures and semiconductor industry. In this dissertation, we investigated the implementations of AMs on different hardware platforms, including microprocessor based personal computer (PC), PC cluster, field programmable gate arrays (FPGA), CMOS, and CMOL nanogrids. We studied two types of neural associative memory models, with and without temporal information. In this research, we first decomposed the computational models into basic and common operations, such as matrix-vector inner-product and k-winners-take-all (k-WTA). We then analyzed the baseline performance/price ratio of implementing the AMs with a PC. We continued with a similar performance/price analysis of the implementations on more parallel hardware platforms, such as PC cluster and FPGA. However, the majority of the research emphasized on the implementations with all digital and mixed-signal full-custom CMOS and CMOL nanogrids. In this dissertation, we draw the conclusion that the mixed-signal CMOL nanogrids exhibit the best performance/price ratio over other hardware platforms. We also highlighted some of the trade-offs between dedicated and virtualized hardware circuits for the HDM models. A simple time-multiplexing scheme for the digital CMOS implementations can achieve comparable throughput as the mixed-signal CMOL nanogrids

    Bio-inspired Neuromorphic Computing Using Memristor Crossbar Networks

    Full text link
    Bio-inspired neuromorphic computing systems built with emerging devices such as memristors have become an active research field. Experimental demonstrations at the network-level have suggested memristor-based neuromorphic systems as a promising candidate to overcome the von-Neumann bottleneck in future computing applications. As a hardware system that offers co-location of memory and data processing, memristor-based networks represent an efficient computing platform with minimal data transfer and high parallelism. Furthermore, active utilization of the dynamic processes during resistive switching in memristors can help realize more faithful emulation of biological device and network behaviors, with the potential to process dynamic temporal inputs efficiently. In this thesis, I present experimental demonstrations of neuromorphic systems using fabricated memristor arrays as well as network-level simulation results. Models of resistive switching behavior in two types of memristor devices, conventional first-order and recently proposed second-order memristor devices, will be first introduced. Secondly, experimental demonstration of K-means clustering through unsupervised learning in a memristor network will be presented. The memristor based hardware systems achieved high classification accuracy (93.3%) on the standard IRIS data set, suggesting practical networks can be built with optimized memristor devices. Thirdly, implementation of a partial differential equation (PDE) solver in memristor arrays will be discussed. This work expands the capability of memristor-based computing hardware from ‘soft’ to ‘hard’ computing tasks, which require very high precision and accurate solutions. In general first-order memristors are suitable to perform tasks that are based on vector-matrix multiplications, ranging from K-means clustering to PDE solvers. On the other hand, utilizing internal device dynamics in second-order memristors can allow natural emulation of biological behaviors and enable network functions such as temporal data processing. An effort to explore second-order memristor devices and their network behaviors will be discussed. Finally, we propose ideas to build large-size passive memristor crossbar arrays, including fabrication approaches, guidelines of device structure, and analysis of the parasitic effects in larger arrays.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147610/1/yjjeong_1.pd

    A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications

    Full text link
    This survey samples from the ever-growing family of adaptive resonance theory (ART) neural network models used to perform the three primary machine learning modalities, namely, unsupervised, supervised and reinforcement learning. It comprises a representative list from classic to modern ART models, thereby painting a general picture of the architectures developed by researchers over the past 30 years. The learning dynamics of these ART models are briefly described, and their distinctive characteristics such as code representation, long-term memory and corresponding geometric interpretation are discussed. Useful engineering properties of ART (speed, configurability, explainability, parallelization and hardware implementation) are examined along with current challenges. Finally, a compilation of online software libraries is provided. It is expected that this overview will be helpful to new and seasoned ART researchers

    VLSI neural networks for computer vision

    Get PDF

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    Neuroengineering of Clustering Algorithms

    Get PDF
    Cluster analysis can be broadly divided into multivariate data visualization, clustering algorithms, and cluster validation. This dissertation contributes neural network-based techniques to perform all three unsupervised learning tasks. Particularly, the first paper provides a comprehensive review on adaptive resonance theory (ART) models for engineering applications and provides context for the four subsequent papers. These papers are devoted to enhancements of ART-based clustering algorithms from (a) a practical perspective by exploiting the visual assessment of cluster tendency (VAT) sorting algorithm as a preprocessor for ART offline training, thus mitigating ordering effects; and (b) an engineering perspective by designing a family of multi-criteria ART models: dual vigilance fuzzy ART and distributed dual vigilance fuzzy ART (both of which are capable of detecting complex cluster structures), merge ART (aggregates partitions and lessens ordering effects in online learning), and cluster validity index vigilance in fuzzy ART (features a robust vigilance parameter selection and alleviates ordering effects in offline learning). The sixth paper consists of enhancements to data visualization using self-organizing maps (SOMs) by depicting in the reduced dimension and topology-preserving SOM grid information-theoretic similarity measures between neighboring neurons. This visualization\u27s parameters are estimated using samples selected via a single-linkage procedure, thereby generating heatmaps that portray more homogeneous within-cluster similarities and crisper between-cluster boundaries. The seventh paper presents incremental cluster validity indices (iCVIs) realized by (a) incorporating existing formulations of online computations for clusters\u27 descriptors, or (b) modifying an existing ART-based model and incrementally updating local density counts between prototypes. Moreover, this last paper provides the first comprehensive comparison of iCVIs in the computational intelligence literature --Abstract, page iv
    • …
    corecore