86,759 research outputs found

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Get PDF
    Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural networ

    Learning of Dynamic Multilayer Neural networks using Parameter Free PSO

    Get PDF
    Recently, learning of continuous trajectories for the identification and/or control of dynamic systems is attractingattention in the application of neural networks. In this paper, we focus on a parameter-free Particle Swarm Optimizationcalled TRIBES, as training algorithm of the dynamic multilayer neural networks (DMNNs). DMNN can be approximatedby a class of the recurrent neural networks with one hidden layer to an arbitrary degree of accuracy. The neural networktraining ability of TRIBES is demonstrated though the computer simulations. In the simulations, TRIBES is compared withother PSO algorithms. As a result, it is confirmed that TRIBES is efficient and practical for learning of continuoustrajectories using DMNNs.Recently, learning of continuous trajectories for the identification and/or control of dynamic systems is attractingattention in the application of neural networks. In this paper, we focus on a parameter-free Particle Swarm Optimizationcalled TRIBES, as training algorithm of the dynamic multilayer neural networks (DMNNs). DMNN can be approximatedby a class of the recurrent neural networks with one hidden layer to an arbitrary degree of accuracy. The neural networktraining ability of TRIBES is demonstrated though the computer simulations. In the simulations, TRIBES is compared withother PSO algorithms. As a result, it is confirmed that TRIBES is efficient and practical for learning of continuoustrajectories using DMNNs

    Autonomous Construction of Multi Layer Perceptron Neural Networks

    Get PDF
    The construction of Multi Layer Perceptron (MLP) neural networks for classification is explored. A novel algorithm is developed, the MLP Iterative Construction Algorithm (MICA), that designs the network architecture as it trains the weights of the hidden layer nodes. The architecture can be optimized on training set classification accuracy, whereby it always achieves 100% classification accuracies, or it can be optimized for generalization. The test results for MICA compare favorably with those of backpropagation on some data sets and far surpasses backpropagation on others while requiring less FLOPS to train. Feature selection is enhanced by MICA because it affords the opportunity to select a different set of features to separate each pair of classes. The particular saliency metric explored is based on the effective decision boundary analysis, but it is implemented without having to search for the decision boundaries, making it efficient to implement. The same saliency metric is adapted for pruning hidden layer nodes to optimize performance. The feature selection and hidden node pruning techniques are shown to decrease the number of weights in the network architecture from one half to two thirds while maintaining classification accuracy

    Scalable and Sustainable Deep Learning via Randomized Hashing

    Full text link
    Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded devices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall computational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets

    Learning Graph Neural Networks with Approximate Gradient Descent

    Full text link
    The first provably efficient algorithm for learning graph neural networks (GNNs) with one hidden layer for node information convolution is provided in this paper. Two types of GNNs are investigated, depending on whether labels are attached to nodes or graphs. A comprehensive framework for designing and analyzing convergence of GNN training algorithms is developed. The algorithm proposed is applicable to a wide range of activation functions including ReLU, Leaky ReLU, Sigmod, Softplus and Swish. It is shown that the proposed algorithm guarantees a linear convergence rate to the underlying true parameters of GNNs. For both types of GNNs, sample complexity in terms of the number of nodes or the number of graphs is characterized. The impact of feature dimension and GNN structure on the convergence rate is also theoretically characterized. Numerical experiments are further provided to validate our theoretical analysis.Comment: 23 pages, accepted at AAAI 202

    Hardware-efficient on-line learning through pipelined truncated-error backpropagation in binary-state networks

    Get PDF
    Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.Comment: Now also consider 0/1 binary activations. Memory access statistics reporte
    corecore