17,150 research outputs found

    Stability of Neural Ordinary Differential Equations with Power Nonlinearities

    Get PDF
    The article presents a study of solutions of ODEs system with a specialnonlinear part, which is a continuous analogue of an arbitrary recurrent neural network(neural ODEs). As a nonlinear part of the mentioned system of differential equations, weused sums of piecewise continuous functions, where each term is a power function. (Theseare activation functions.) The use of power activation functions (PAF) in neural networksis a generalization of well-known the rectified linear units (ReLU). In the present timeReLU are commonly used to increase the depth of trained of a neural network. Therefore,the introduction of PAF into neural networks significantly expands the possibilities ofReLU. Note that the purpose of introducing power activation functions is that theyallow one to obtain verifiable Lyapunov stability conditions for solutions of the systemdifferential equations simulating the corresponding dynamic processes. In turn, Lyapunovstability is one of the guarantees of the adequacy of the neural network model for theprocess under study. In addition, from the global stability (or at least the boundedness)of continuous analog solutions it follows that learning process of the corresponding neuralnetwork will not diverge for any training sample.The article presents a study of solutions of ODEs system with a special nonlinear part, which is a continuous analogue of an arbitrary recurrent neural network (neural ODEs). As a nonlinear part of the mentioned system of differential equations, we used sums of piecewise continuous functions, where each term is a power function. (These are activation functions.) The use of power activation functions (PAF) in neural networks is a generalization of well-known the rectified linear units (ReLU). In the present time ReLU are commonly used to increase the depth of trained of a neural network. Therefore, the introduction of PAF into neural networks significantly expands the possibilities ofReLU. Note that the purpose of introducing power activation functions is that they allow one to obtain verifiable Lyapunov stability conditions for solutions of the system differential equations simulating the corresponding dynamic processes. In turn, Lyapunov stability is one of the guarantees of the adequacy of the neural network model for the process under study. In addition, from the global stability (or at least the boundedness) of continuous analog solutions it follows that learning process of the corresponding neural network will not diverge for any training sample

    Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size

    Full text link
    The development of a satisfying and rigorous mathematical understanding of the performance of neural networks is a major challenge in artificial intelligence. Against this background, we study the expressive power of neural networks through the example of the classical NP-hard Knapsack Problem. Our main contribution is a class of recurrent neural networks (RNNs) with rectified linear units that are iteratively applied to each item of a Knapsack instance and thereby compute optimal or provably good solution values. We show that an RNN of depth four and width depending quadratically on the profit of an optimum Knapsack solution is sufficient to find optimum Knapsack solutions. We also prove the following tradeoff between the size of an RNN and the quality of the computed Knapsack solution: for Knapsack instances consisting of nn items, an RNN of depth five and width ww computes a solution of value at least 1O(n2/w)1-\mathcal{O}(n^2/\sqrt{w}) times the optimum solution value. Our results build upon a classical dynamic programming formulation of the Knapsack Problem as well as a careful rounding of profit values that are also at the core of the well-known fully polynomial-time approximation scheme for the Knapsack Problem. A carefully conducted computational study qualitatively supports our theoretical size bounds. Finally, we point out that our results can be generalized to many other combinatorial optimization problems that admit dynamic programming solution methods, such as various Shortest Path Problems, the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.Comment: A short version of this paper appears in the proceedings of AAAI 202

    ADADELTA: An Adaptive Learning Rate Method

    Full text link
    We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.Comment: 6 page
    corecore