3,370 research outputs found

    FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

    Full text link
    Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

    Error-triggered Three-Factor Learning Dynamics for Crossbar Arrays

    Get PDF
    Recent breakthroughs suggest that local, approximate gradient descent learning is compatible with Spiking Neural Networks (SNNs). Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn in-situ as accurately as conventional processors is still missing. Here, we propose a subthreshold circuit architecture designed through insights obtained from machine learning and computational neuroscience that could achieve such accuracy. Using a surrogate gradient learning framework, we derive local, error-triggered learning dynamics compatible with crossbar arrays and the temporal dynamics of SNNs. The derivation reveals that circuits used for inference and training dynamics can be shared, which simplifies the circuit and suppresses the effects of fabrication mismatch. We present SPICE simulations on XFAB 180nm process, as well as large-scale simulations of the spiking neural networks on event-based benchmarks, including a gesture recognition task. Our results show that the number of updates can be reduced hundred-fold compared to the standard rule while achieving performances that are on par with the state-of-the-art

    Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    Get PDF
    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed

    Optical Interconnection Networks Based on Microring Resonators

    Get PDF
    Abstract — Interconnection networks must transport an always increasing information density and connect a rising number of processing units. Electronic technologies have been able to sustain the traffic growth rate, but are getting close to their physical limits. In this context, optical interconnection networks are becoming progressively more attractive, especially because new photonic devices can be directly integrated in CMOS technology. Indeed, interest in microring resonators as switching components is rising, but their usability in full optical interconnection architectures is still limited by their physical characteristics. Indeed, differently from classical devices used for switching, switching elements based on microring resonators exhibit asymmetric power losses depending on the output ports input signals are directed to. In this paper, we study classical interconnection architectures such as crossbar, Benes and Clos networks exploiting microring resonators as building blocks. Since classical interconnection networks lack either scalability or complexity, we propose two new architectures to improve performance of microring based interconnection networks while keeping a reasonable complexity. I
    corecore