175 research outputs found
A study on hardware design for high performance artificial neural network by using FPGA and NoC
制度:新 ; 報告番号:甲3421号 ; 学位の種類:博士(工学) ; 授与年月日:2011/9/15 ; 早大学位記番号:新574
Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks
Artificial neural networks (ANNs) trained using backpropagation are powerful
learning architectures that have achieved state-of-the-art performance in
various benchmarks. Significant effort has been devoted to developing custom
silicon devices to accelerate inference in ANNs. Accelerating the training
phase, however, has attracted relatively little attention. In this paper, we
describe a hardware-efficient on-line learning technique for feedforward
multi-layer ANNs that is based on pipelined backpropagation. Learning is
performed in parallel with inference in the forward pass, removing the need for
an explicit backward pass and requiring no extra weight lookup. By using binary
state variables in the feedforward network and ternary errors in
truncated-error backpropagation, the need for any multiplications in the
forward and backward passes is removed, and memory requirements for the
pipelining are drastically reduced. Further reduction in addition operations
owing to the sparsity in the forward neural and backpropagating error signal
paths contributes to highly efficient hardware implementation. For
proof-of-concept validation, we demonstrate on-line learning of MNIST
handwritten digit classification on a Spartan 6 FPGA interfacing with an
external 1Gb DDR2 DRAM, that shows small degradation in test error performance
compared to an equivalently sized binary ANN trained off-line using standard
back-propagation and exact errors. Our results highlight an attractive synergy
between pipelined backpropagation and binary-state networks in substantially
reducing computation and memory requirements, making pipelined on-line learning
practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics
reporte
Accelerating Training of Deep Neural Networks via Sparse Edge Processing
We propose a reconfigurable hardware architecture for deep neural networks
(DNNs) capable of online training and inference, which uses algorithmically
pre-determined, structured sparsity to significantly lower memory and
computational requirements. This novel architecture introduces the notion of
edge-processing to provide flexibility and combines junction pipelining and
operational parallelization to speed up training. The overall effect is to
reduce network complexity by factors up to 30x and training time by up to 35x
relative to GPUs, while maintaining high fidelity of inference results. This
has the potential to enable extensive parameter searches and development of the
largely unexplored theoretical foundation of DNNs. The architecture
automatically adapts itself to different network sizes given available hardware
resources. As proof of concept, we show results obtained for different bit
widths.Comment: Presented at the 26th International Conference on Artificial Neural
Networks (ICANN) 2017 in Alghero, Ital
Implementation Costs of Spiking versus Rate-Based ANNs
Artificial neural networks are an effective machine learning technique for a variety of data sets and domains, but exploiting the inherent parallelism in neural networks requires specialized hardware. Typically, computing the output of each neuron requires many multiplications, evaluation of a transcendental activation function, and transfer of its output to a large number of other neurons. These restrictions become more expensive when internal values are represented with increasingly higher data precision. A spiking neural network eliminates the limitations of typical rate-based neural networks by reducing neuron output and synapse weights to one-bit values, eliminating hardware multipliers, and simplifying the activation function. However, a spiking neural network requires a larger number of neurons than what is needed in a comparable rate-based network. In order to determine if the benefits of spiking neural networks outweigh the costs, we designed the smallest spiking neural network and rate-based artificial neural network that achieved 90% or comparable testing accuracy on the MNIST data set. After estimating the FPGA storage requirements for synapse values of each network, we concluded rate-based neural networks need significantly fewer bits than spiking neural networks
ETLP: Event-based Three-factor Local Plasticity for online learning with neuromorphic hardware
Neuromorphic perception with event-based sensors, asynchronous hardware and
spiking neurons is showing promising results for real-time and energy-efficient
inference in embedded systems. The next promise of brain-inspired computing is
to enable adaptation to changes at the edge with online learning. However, the
parallel and distributed architectures of neuromorphic hardware based on
co-localized compute and memory imposes locality constraints to the on-chip
learning rules. We propose in this work the Event-based Three-factor Local
Plasticity (ETLP) rule that uses (1) the pre-synaptic spike trace, (2) the
post-synaptic membrane voltage and (3) a third factor in the form of projected
labels with no error calculation, that also serve as update triggers. We apply
ETLP with feedforward and recurrent spiking neural networks on visual and
auditory event-based pattern recognition, and compare it to Back-Propagation
Through Time (BPTT) and eProp. We show a competitive performance in accuracy
with a clear advantage in the computational complexity for ETLP. We also show
that when using local plasticity, threshold adaptation in spiking neurons and a
recurrent topology are necessary to learn spatio-temporal patterns with a rich
temporal structure. Finally, we provide a proof of concept hardware
implementation of ETLP on FPGA to highlight the simplicity of its computational
primitives and how they can be mapped into neuromorphic hardware for online
learning with low-energy consumption and real-time interaction
Hardware-Accelerated neural network
This work presents a parametrizable design of a neural network on an FPGA, being trained previously in the CPU using backpropagation. We test different configurations and report delay, power and area
Deep supervised learning using local errors
Error backpropagation is a highly effective mechanism for learning
high-quality hierarchical features in deep networks. Updating the features or
weights in one layer, however, requires waiting for the propagation of error
signals from higher layers. Learning using delayed and non-local errors makes
it hard to reconcile backpropagation with the learning mechanisms observed in
biological neural networks as it requires the neurons to maintain a memory of
the input long enough until the higher-layer errors arrive. In this paper, we
propose an alternative learning mechanism where errors are generated locally in
each layer using fixed, random auxiliary classifiers. Lower layers could thus
be trained independently of higher layers and training could either proceed
layer by layer, or simultaneously in all layers using local error information.
We address biological plausibility concerns such as weight symmetry
requirements and show that the proposed learning mechanism based on fixed,
broad, and random tuning of each neuron to the classification categories
outperforms the biologically-motivated feedback alignment learning technique on
the MNIST, CIFAR10, and SVHN datasets, approaching the performance of standard
backpropagation. Our approach highlights a potential biological mechanism for
the supervised, or task-dependent, learning of feature hierarchies. In
addition, we show that it is well suited for learning deep networks in custom
hardware where it can drastically reduce memory traffic and data communication
overheads
- …