86,759 research outputs found
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
Learning of Dynamic Multilayer Neural networks using Parameter Free PSO
Recently, learning of continuous trajectories for the identification and/or control of dynamic systems is attractingattention in the application of neural networks. In this paper, we focus on a parameter-free Particle Swarm Optimizationcalled TRIBES, as training algorithm of the dynamic multilayer neural networks (DMNNs). DMNN can be approximatedby a class of the recurrent neural networks with one hidden layer to an arbitrary degree of accuracy. The neural networktraining ability of TRIBES is demonstrated though the computer simulations. In the simulations, TRIBES is compared withother PSO algorithms. As a result, it is confirmed that TRIBES is efficient and practical for learning of continuoustrajectories using DMNNs.Recently, learning of continuous trajectories for the identification and/or control of dynamic systems is attractingattention in the application of neural networks. In this paper, we focus on a parameter-free Particle Swarm Optimizationcalled TRIBES, as training algorithm of the dynamic multilayer neural networks (DMNNs). DMNN can be approximatedby a class of the recurrent neural networks with one hidden layer to an arbitrary degree of accuracy. The neural networktraining ability of TRIBES is demonstrated though the computer simulations. In the simulations, TRIBES is compared withother PSO algorithms. As a result, it is confirmed that TRIBES is efficient and practical for learning of continuoustrajectories using DMNNs
Autonomous Construction of Multi Layer Perceptron Neural Networks
The construction of Multi Layer Perceptron (MLP) neural networks for classification is explored. A novel algorithm is developed, the MLP Iterative Construction Algorithm (MICA), that designs the network architecture as it trains the weights of the hidden layer nodes. The architecture can be optimized on training set classification accuracy, whereby it always achieves 100% classification accuracies, or it can be optimized for generalization. The test results for MICA compare favorably with those of backpropagation on some data sets and far surpasses backpropagation on others while requiring less FLOPS to train. Feature selection is enhanced by MICA because it affords the opportunity to select a different set of features to separate each pair of classes. The particular saliency metric explored is based on the effective decision boundary analysis, but it is implemented without having to search for the decision boundaries, making it efficient to implement. The same saliency metric is adapted for pruning hidden layer nodes to optimize performance. The feature selection and hidden node pruning techniques are shown to decrease the number of weights in the network architecture from one half to two thirds while maintaining classification accuracy
Scalable and Sustainable Deep Learning via Randomized Hashing
Current deep learning architectures are growing larger in order to learn from
complex datasets. These architectures require giant matrix multiplication
operations to train millions of parameters. Conversely, there is another
growing trend to bring deep learning to low-power, embedded devices. The matrix
operations, associated with both training and testing of deep networks, are
very expensive from a computational and energy standpoint. We present a novel
hashing based technique to drastically reduce the amount of computation needed
to train and test deep networks. Our approach combines recent ideas from
adaptive dropouts and randomized hashing for maximum inner product search to
select the nodes with the highest activation efficiently. Our new algorithm for
deep learning reduces the overall computational cost of forward and
back-propagation by operating on significantly fewer (sparse) nodes. As a
consequence, our algorithm uses only 5% of the total multiplications, while
keeping on average within 1% of the accuracy of the original model. A unique
property of the proposed hashing based back-propagation is that the updates are
always sparse. Due to the sparse gradient updates, our algorithm is ideally
suited for asynchronous and parallel training leading to near linear speedup
with increasing number of cores. We demonstrate the scalability and
sustainability (energy efficiency) of our proposed algorithm via rigorous
experimental evaluations on several real datasets
Learning Graph Neural Networks with Approximate Gradient Descent
The first provably efficient algorithm for learning graph neural networks
(GNNs) with one hidden layer for node information convolution is provided in
this paper. Two types of GNNs are investigated, depending on whether labels are
attached to nodes or graphs. A comprehensive framework for designing and
analyzing convergence of GNN training algorithms is developed. The algorithm
proposed is applicable to a wide range of activation functions including ReLU,
Leaky ReLU, Sigmod, Softplus and Swish. It is shown that the proposed algorithm
guarantees a linear convergence rate to the underlying true parameters of GNNs.
For both types of GNNs, sample complexity in terms of the number of nodes or
the number of graphs is characterized. The impact of feature dimension and GNN
structure on the convergence rate is also theoretically characterized.
Numerical experiments are further provided to validate our theoretical
analysis.Comment: 23 pages, accepted at AAAI 202
Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks
Artificial neural networks (ANNs) trained using backpropagation are powerful
learning architectures that have achieved state-of-the-art performance in
various benchmarks. Significant effort has been devoted to developing custom
silicon devices to accelerate inference in ANNs. Accelerating the training
phase, however, has attracted relatively little attention. In this paper, we
describe a hardware-efficient on-line learning technique for feedforward
multi-layer ANNs that is based on pipelined backpropagation. Learning is
performed in parallel with inference in the forward pass, removing the need for
an explicit backward pass and requiring no extra weight lookup. By using binary
state variables in the feedforward network and ternary errors in
truncated-error backpropagation, the need for any multiplications in the
forward and backward passes is removed, and memory requirements for the
pipelining are drastically reduced. Further reduction in addition operations
owing to the sparsity in the forward neural and backpropagating error signal
paths contributes to highly efficient hardware implementation. For
proof-of-concept validation, we demonstrate on-line learning of MNIST
handwritten digit classification on a Spartan 6 FPGA interfacing with an
external 1Gb DDR2 DRAM, that shows small degradation in test error performance
compared to an equivalently sized binary ANN trained off-line using standard
back-propagation and exact errors. Our results highlight an attractive synergy
between pipelined backpropagation and binary-state networks in substantially
reducing computation and memory requirements, making pipelined on-line learning
practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics
reporte
- …