10,198 research outputs found
Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks
Artificial neural networks (ANNs) trained using backpropagation are powerful
learning architectures that have achieved state-of-the-art performance in
various benchmarks. Significant effort has been devoted to developing custom
silicon devices to accelerate inference in ANNs. Accelerating the training
phase, however, has attracted relatively little attention. In this paper, we
describe a hardware-efficient on-line learning technique for feedforward
multi-layer ANNs that is based on pipelined backpropagation. Learning is
performed in parallel with inference in the forward pass, removing the need for
an explicit backward pass and requiring no extra weight lookup. By using binary
state variables in the feedforward network and ternary errors in
truncated-error backpropagation, the need for any multiplications in the
forward and backward passes is removed, and memory requirements for the
pipelining are drastically reduced. Further reduction in addition operations
owing to the sparsity in the forward neural and backpropagating error signal
paths contributes to highly efficient hardware implementation. For
proof-of-concept validation, we demonstrate on-line learning of MNIST
handwritten digit classification on a Spartan 6 FPGA interfacing with an
external 1Gb DDR2 DRAM, that shows small degradation in test error performance
compared to an equivalently sized binary ANN trained off-line using standard
back-propagation and exact errors. Our results highlight an attractive synergy
between pipelined backpropagation and binary-state networks in substantially
reducing computation and memory requirements, making pipelined on-line learning
practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics
reporte
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
Improving neural networks by preventing co-adaptation of feature detectors
When a large feedforward neural network is trained on a small training set,
it typically performs poorly on held-out test data. This "overfitting" is
greatly reduced by randomly omitting half of the feature detectors on each
training case. This prevents complex co-adaptations in which a feature detector
is only helpful in the context of several other specific feature detectors.
Instead, each neuron learns to detect a feature that is generally helpful for
producing the correct answer given the combinatorially large variety of
internal contexts in which it must operate. Random "dropout" gives big
improvements on many benchmark tasks and sets new records for speech and object
recognition
Robust training of recurrent neural networks to handle missing data for disease progression modeling
Disease progression modeling (DPM) using longitudinal data is a challenging
task in machine learning for healthcare that can provide clinicians with better
tools for diagnosis and monitoring of disease. Existing DPM algorithms neglect
temporal dependencies among measurements and make parametric assumptions about
biomarker trajectories. In addition, they do not model multiple biomarkers
jointly and need to align subjects' trajectories. In this paper, recurrent
neural networks (RNNs) are utilized to address these issues. However, in many
cases, longitudinal cohorts contain incomplete data, which hinders the
application of standard RNNs and requires a pre-processing step such as
imputation of the missing values. We, therefore, propose a generalized training
rule for the most widely used RNN architecture, long short-term memory (LSTM)
networks, that can handle missing values in both target and predictor
variables. This algorithm is applied for modeling the progression of
Alzheimer's disease (AD) using magnetic resonance imaging (MRI) biomarkers. The
results show that the proposed LSTM algorithm achieves a lower mean absolute
error for prediction of measurements across all considered MRI biomarkers
compared to using standard LSTM networks with data imputation or using a
regression-based DPM method. Moreover, applying linear discriminant analysis to
the biomarkers' values predicted by the proposed algorithm results in a larger
area under the receiver operating characteristic curve (AUC) for clinical
diagnosis of AD compared to the same alternatives, and the AUC is comparable to
state-of-the-art AUCs from a recent cross-sectional medical image
classification challenge. This paper shows that built-in handling of missing
values in LSTM network training paves the way for application of RNNs in
disease progression modeling.Comment: 9 pages, 1 figure, MIDL conferenc
Training recurrent neural networks robust to incomplete data: application to Alzheimer's disease progression modeling
Disease progression modeling (DPM) using longitudinal data is a challenging
machine learning task. Existing DPM algorithms neglect temporal dependencies
among measurements, make parametric assumptions about biomarker trajectories,
do not model multiple biomarkers jointly, and need an alignment of subjects'
trajectories. In this paper, recurrent neural networks (RNNs) are utilized to
address these issues. However, in many cases, longitudinal cohorts contain
incomplete data, which hinders the application of standard RNNs and requires a
pre-processing step such as imputation of the missing values. Instead, we
propose a generalized training rule for the most widely used RNN architecture,
long short-term memory (LSTM) networks, that can handle both missing predictor
and target values. The proposed LSTM algorithm is applied to model the
progression of Alzheimer's disease (AD) using six volumetric magnetic resonance
imaging (MRI) biomarkers, i.e., volumes of ventricles, hippocampus, whole
brain, fusiform, middle temporal gyrus, and entorhinal cortex, and it is
compared to standard LSTM networks with data imputation and a parametric,
regression-based DPM method. The results show that the proposed algorithm
achieves a significantly lower mean absolute error (MAE) than the alternatives
with p < 0.05 using Wilcoxon signed rank test in predicting values of almost
all of the MRI biomarkers. Moreover, a linear discriminant analysis (LDA)
classifier applied to the predicted biomarker values produces a significantly
larger AUC of 0.90 vs. at most 0.84 with p < 0.001 using McNemar's test for
clinical diagnosis of AD. Inspection of MAE curves as a function of the amount
of missing data reveals that the proposed LSTM algorithm achieves the best
performance up until more than 74% missing values. Finally, it is illustrated
how the method can successfully be applied to data with varying time intervals.Comment: arXiv admin note: substantial text overlap with arXiv:1808.0550
- …