17,150 research outputs found
Stability of Neural Ordinary Differential Equations with Power Nonlinearities
The article presents a study of solutions of ODEs system with a specialnonlinear part, which is a continuous analogue of an arbitrary recurrent neural network(neural ODEs). As a nonlinear part of the mentioned system of differential equations, weused sums of piecewise continuous functions, where each term is a power function. (Theseare activation functions.) The use of power activation functions (PAF) in neural networksis a generalization of well-known the rectified linear units (ReLU). In the present timeReLU are commonly used to increase the depth of trained of a neural network. Therefore,the introduction of PAF into neural networks significantly expands the possibilities ofReLU. Note that the purpose of introducing power activation functions is that theyallow one to obtain verifiable Lyapunov stability conditions for solutions of the systemdifferential equations simulating the corresponding dynamic processes. In turn, Lyapunovstability is one of the guarantees of the adequacy of the neural network model for theprocess under study. In addition, from the global stability (or at least the boundedness)of continuous analog solutions it follows that learning process of the corresponding neuralnetwork will not diverge for any training sample.The article presents a study of solutions of ODEs system with a special nonlinear part, which is a continuous analogue of an arbitrary recurrent neural network (neural ODEs). As a nonlinear part of the mentioned system of differential equations, we used sums of piecewise continuous functions, where each term is a power function. (These are activation functions.) The use of power activation functions (PAF) in neural networks is a generalization of well-known the rectified linear units (ReLU). In the present time ReLU are commonly used to increase the depth of trained of a neural network. Therefore, the introduction of PAF into neural networks significantly expands the possibilities ofReLU. Note that the purpose of introducing power activation functions is that they allow one to obtain verifiable Lyapunov stability conditions for solutions of the system differential equations simulating the corresponding dynamic processes. In turn, Lyapunov stability is one of the guarantees of the adequacy of the neural network model for the process under study. In addition, from the global stability (or at least the boundedness) of continuous analog solutions it follows that learning process of the corresponding neural network will not diverge for any training sample
Provably Good Solutions to the Knapsack Problem via Neural Networks of Bounded Size
The development of a satisfying and rigorous mathematical understanding of
the performance of neural networks is a major challenge in artificial
intelligence. Against this background, we study the expressive power of neural
networks through the example of the classical NP-hard Knapsack Problem. Our
main contribution is a class of recurrent neural networks (RNNs) with rectified
linear units that are iteratively applied to each item of a Knapsack instance
and thereby compute optimal or provably good solution values. We show that an
RNN of depth four and width depending quadratically on the profit of an optimum
Knapsack solution is sufficient to find optimum Knapsack solutions. We also
prove the following tradeoff between the size of an RNN and the quality of the
computed Knapsack solution: for Knapsack instances consisting of items, an
RNN of depth five and width computes a solution of value at least
times the optimum solution value. Our results
build upon a classical dynamic programming formulation of the Knapsack Problem
as well as a careful rounding of profit values that are also at the core of the
well-known fully polynomial-time approximation scheme for the Knapsack Problem.
A carefully conducted computational study qualitatively supports our
theoretical size bounds. Finally, we point out that our results can be
generalized to many other combinatorial optimization problems that admit
dynamic programming solution methods, such as various Shortest Path Problems,
the Longest Common Subsequence Problem, and the Traveling Salesperson Problem.Comment: A short version of this paper appears in the proceedings of AAAI 202
ADADELTA: An Adaptive Learning Rate Method
We present a novel per-dimension learning rate method for gradient descent
called ADADELTA. The method dynamically adapts over time using only first order
information and has minimal computational overhead beyond vanilla stochastic
gradient descent. The method requires no manual tuning of a learning rate and
appears robust to noisy gradient information, different model architecture
choices, various data modalities and selection of hyperparameters. We show
promising results compared to other methods on the MNIST digit classification
task using a single machine and on a large scale voice dataset in a distributed
cluster environment.Comment: 6 page
- …