10,308 research outputs found

    Design and construction of a configurable full-field range imaging system for mobile robotic applications

    Get PDF
    Mobile robotic devices rely critically on extrospection sensors to determine the range to objects in the robot’s operating environment. This provides the robot with the ability both to navigate safely around obstacles and to map its environment and hence facilitate path planning and navigation. There is a requirement for a full-field range imaging system that can determine the range to any obstacle in a camera lens’ field of view accurately and in real-time. This paper details the development of a portable full-field ranging system whose bench-top version has demonstrated sub-millimetre precision. However, this precision required non-real-time acquisition rates and expensive hardware. By iterative replacement of components, a portable, modular and inexpensive version of this full-field ranger has been constructed, capable of real-time operation with some (user-defined) trade-off with precision

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system

    FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

    Full text link
    Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

    MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing

    Get PDF
    Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements

    A Study of FPGA Resource Utilization for Pipelined Windowed Image Computations

    Get PDF
    In image processing operations, each pixel is often treated independently and operated upon by using values of other pixels in the neighborhood. These operations are often called windowed image computations (or neighborhood operations). In this thesis, we examine the implementation of a windowed computation pipeline in an FPGA-based environment. Typically, the image is generated outside the FPGA environment (such as through a camera) and the result of the windowed computation is consumed outside the FPGA environment (for example, in a screen for display or an engine for higher level analysis). The image is typically large (over a million pixels 1000×1000 image) and the FPGA input-output (I/O) infrastructure is quite modest in comparison (typically a few hundred pins). Consequently, the image is brought into the chip a small piece (tile) at a time. We define a handshaking scheme that allows us to construct an FPGA architecture without making large assumptions about component speeds and synchronization. We define a pipeline architecture for windowed computations, including details of a stage to accommodate FPGA pin-limitation and bounded storage. We implement a design to better suit FPGAs where it ensures a smoother (stall-resistant) flow of the computation in the pipeline. Based on the architecture proposed, we have analytically predicted resource usage in the FPGA. In particular, we have shown that for an N×N image processed as n×n tiles on a z-stage windowed computation with parameter w; θ(n^2+log⁡N+log⁡z ) pins are used and θ(n^2 z) memory is used. We ran simulations that validated these predictions on two FPGAs (Artix-7 and Kintex-7) with different resources. As we had predicted, the pins and distributed memory turned out to be the most used resources. Our simulations have also shown that the operating clock speed of the design is relatively independent of the number of stages in the pipeline; this is in line with what was expected with the handshaking scheme that isolates the timing of communicating modules. Our work, although aimed at FPGAs, could also be applied to any I/O pin-limited devices and memory limited environments
    • …
    corecore