25,188 research outputs found

    An architecture for multi-layer feed-forward neural networks.

    Get PDF
    Feed-forward neural networks can perform classifications and generalizations that are difficult to achieve with any other known method, and their performance matches or surpasses that of the conventional methods. To utilize the potential of these networks to the fullest, however, an efficient hardware implementation is needed. In this thesis, an architecture for efficient implementation of food-forward multi-layer neural networks is introduced. The interconnection congestion problem is addressed by a multiplexing scheme, which reduces the number of physical interconnections without any loss of generality. The building blocks are mostly in current mode analog CMOS, and the connection strengths of the network are stored in a digital memory. Also included in this thesis is a performance analysis of the architecture and a study of the effects of quantization and truncation of connection strengths on network performance.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1991 .N687. Source: Masters Abstracts International, Volume: 31-01, page: 0398. Co-Supervisors: M. Ahmadi; M. Shridhar. Thesis (M.A.Sc.)--University of Windsor (Canada), 1991

    RNA: REUSABLE NEURON ARCHITECTURE FOR ON-CHIP ELECTROCARDIOGRAM CLASSIFICATION AND MACHINE LEARNING

    Get PDF
    Artificial neural networks (ANN) offer tremendous promise in classifying electrocardiogram (ECG) for detection and diagnosis of cardiovascular diseases. In this thesis, we propose a reusable neuron architecture (RNA) to enable an efficient and cost-effective ANN-based ECG processing by multiplexing the same physical neurons for both feed-forward and back-propagation stages. RNA further conserves the area and resources of the chip and reduces power dissipation by coalescing different layers of the neural network into a single layer. Moreover, the microarchitecture of each RNA neuron has been optimized to maximize the degree of hardware reusability by fusing multiple two-input multipliers and a multi-input adder into one two-input multiplier and one two-input adder. With RNA, we demonstrated a hardware implementation of a three-layer 51-30-12 artificial neural network using only thirty physical RNA neurons.A quantitative design space exploration in area, power dissipation, and speed between the proposed RNA and three other implementations representative of different reusable hardware strategies is presented and discussed. An RNA ASIC was implemented using 45nm CMOS technology and verified on a Xilinx Virtex-5 FPGA board. Compared with an equivalent software implementation in C executed on a mainstream embedded microprocessor, the RNA ASIC improves both the training speed and the energy efficiency by three orders of magnitude, respectively. The real-time and functional correctness of RNA was verified using real ECG signals from the MIT-BIH arrhythmia database

    E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks

    Full text link
    Recurrent Neural Networks (RNNs) are a key technology for emerging applications such as automatic speech recognition, machine translation or image description. Long Short Term Memory (LSTM) networks are the most successful RNN implementation, as they can learn long term dependencies to achieve high accuracy. Unfortunately, the recurrent nature of LSTM networks significantly constrains the amount of parallelism and, hence, multicore CPUs and many-core GPUs exhibit poor efficiency for RNN inference. In this paper, we present E-PUR, an energy-efficient processing unit tailored to the requirements of LSTM computation. The main goal of E-PUR is to support large recurrent neural networks for low-power mobile devices. E-PUR provides an efficient hardware implementation of LSTM networks that is flexible to support diverse applications. One of its main novelties is a technique that we call Maximizing Weight Locality (MWL), which improves the temporal locality of the memory accesses for fetching the synaptic weights, reducing the memory requirements by a large extent. Our experimental results show that E-PUR achieves real-time performance for different LSTM networks, while reducing energy consumption by orders of magnitude with respect to general-purpose processors and GPUs, and it requires a very small chip area. Compared to a modern mobile SoC, an NVIDIA Tegra X1, E-PUR provides an average energy reduction of 92x
    • …
    corecore