234 research outputs found

    Neuro-memristive Circuits for Edge Computing: A review

    Full text link
    The volume, veracity, variability, and velocity of data produced from the ever-increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks and open problems in the field of neuro-memristive circuits for edge computing

    Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks

    Full text link
    Deep Neural Networks (DNNs) have gained immense success in cognitive applications and greatly pushed today's artificial intelligence forward. The biggest challenge in executing DNNs is their extremely data-extensive computations. The computing efficiency in speed and energy is constrained when traditional computing platforms are employed in such computational hungry executions. Spiking neuromorphic computing (SNC) has been widely investigated in deep networks implementation own to their high efficiency in computation and communication. However, weights and signals of DNNs are required to be quantized when deploying the DNNs on the SNC, which results in unacceptable accuracy loss. %However, the system accuracy is limited by quantizing data directly in deep networks deployment. Previous works mainly focus on weights discretize while inter-layer signals are mainly neglected. In this work, we propose to represent DNNs with fixed integer inter-layer signals and fixed-point weights while holding good accuracy. We implement the proposed DNNs on the memristor-based SNC system as a deployment example. With 4-bit data representation, our results show that the accuracy loss can be controlled within 0.02% (2.3%) on MNIST (CIFAR-10). Compared with the 8-bit dynamic fixed-point DNNs, our system can achieve more than 9.8x speedup, 89.1% energy saving, and 30% area saving.Comment: 6 pages, 4 figure

    FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

    Full text link
    Neural Network (NN) accelerators with emerging ReRAM (resistive random access memory) technologies have been investigated as one of the promising solutions to address the \textit{memory wall} challenge, due to the unique capability of \textit{processing-in-memory} within ReRAM-crossbar-based processing elements (PEs). However, the high efficiency and high density advantages of ReRAM have not been fully utilized due to the huge communication demands among PEs and the overhead of peripheral circuits. In this paper, we propose a full system stack solution, composed of a reconfigurable architecture design, Field Programmable Synapse Array (FPSA) and its software system including neural synthesizer, temporal-to-spatial mapper, and placement & routing. We highly leverage the software system to make the hardware design compact and efficient. To satisfy the high-performance communication demand, we optimize it with a reconfigurable routing architecture and the placement & routing tool. To improve the computational density, we greatly simplify the PE circuit with the spiking schema and then adopt neural synthesizer to enable the high density computation-resources to support different kinds of NN operations. In addition, we provide spiking memory blocks (SMBs) and configurable logic blocks (CLBs) in hardware and leverage the temporal-to-spatial mapper to utilize them to balance the storage and computation requirements of NN. Owing to the end-to-end software system, we can efficiently deploy existing deep neural networks to FPSA. Evaluations show that, compared to one of state-of-the-art ReRAM-based NN accelerators, PRIME, the computational density of FPSA improves by 31x; for representative NNs, its inference performance can achieve up to 1000x speedup.Comment: Accepted by ASPLOS 201

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

    Efficient and Robust Neuromorphic Computing Design

    Get PDF
    In recent years, brain inspired neuromorphic computing system (NCS) has been intensively studied in both circuit level and architecture level. NCS has demonstrated remarkable advantages for its high-energy efficiency, extremely compact space occupation and parallel data processing. However, due to the limited hardware resources, severe IR-Drop and process variation problems for synapse crossbar, and limited synapse device resolution, it’s still a great challenge for hardware NCS design to catch up with the fast development of software deep neural networks (DNNs). This dissertation explores model compression and acceleration methods for deep neural networks to save both memory and computation resources for the hardware implementation of DNNs. Firstly, DNNs’ weights quantization work is presented to use three orthogonal methods to learn synapses with one-level precision, namely, distribution-aware quantization, quantization regularization and bias tuning, to make image classification accuracy comparable to the state-ofthe-art. And then a two-step framework named group scissor, including rank clipping and group connection deletion methods, is presented to address the problems on large synapse crossbar consuming and high routing congestion between crossbars. Results show that after applying weights quantization methods, accuracy drop can be well controlled within negligible level for MNIST and CIFAR-10 dataset, compared to an ideal system without quantization. And for the group scissor framework method, crossbar area and routing area could be reduced to 8% (at most) of original size, indicating that the hardware implementation area has been saved a lot. Furthermore, the system scalability has been improved significantly
    corecore