33,728 research outputs found
Gradient-Free Learning Based on the Kernel and the Range Space
In this article, we show that solving the system of linear equations by
manipulating the kernel and the range space is equivalent to solving the
problem of least squares error approximation. This establishes the ground for a
gradient-free learning search when the system can be expressed in the form of a
linear matrix equation. When the nonlinear activation function is invertible,
the learning problem of a fully-connected multilayer feedforward neural network
can be easily adapted for this novel learning framework. By a series of kernel
and range space manipulations, it turns out that such a network learning boils
down to solving a set of cross-coupling equations. By having the weights
randomly initialized, the equations can be decoupled and the network solution
shows relatively good learning capability for real world data sets of small to
moderate dimensions. Based on the structural information of the matrix
equation, the network representation is found to be dependent on the number of
data samples and the output dimension.Comment: The idea of kernel and range projection was first introduced in the
IEEE/ACIS ICIS conference which was held in Singapore in June 2018. This
article presents a full development of the method supported by extensive
numerical result
The loss surface of deep linear networks viewed through the algebraic geometry lens
By using the viewpoint of modern computational algebraic geometry, we explore
properties of the optimization landscapes of the deep linear neural network
models. After clarifying on the various definitions of "flat" minima, we show
that the geometrically flat minima, which are merely artifacts of residual
continuous symmetries of the deep linear networks, can be straightforwardly
removed by a generalized regularization. Then, we establish upper bounds
on the number of isolated stationary points of these networks with the help of
algebraic geometry. Using these upper bounds and utilizing a numerical
algebraic geometry method, we find all stationary points of modest depth and
matrix size. We show that in the presence of the non-zero regularization, deep
linear networks indeed possess local minima which are not the global minima.
Our computational results clarify certain aspects of the loss surfaces of deep
linear networks and provide novel insights.Comment: 16 pages (2-columns), 5 figure
Learning Deep Stochastic Optimal Control Policies using Forward-Backward SDEs
In this paper we propose a new methodology for decision-making under
uncertainty using recent advancements in the areas of nonlinear stochastic
optimal control theory, applied mathematics, and machine learning. Grounded on
the fundamental relation between certain nonlinear partial differential
equations and forward-backward stochastic differential equations, we develop a
control framework that is scalable and applicable to general classes of
stochastic systems and decision-making problem formulations in robotics and
autonomy. The proposed deep neural network architectures for stochastic control
consist of recurrent and fully connected layers. The performance and
scalability of the aforementioned algorithm are investigated in three
non-linear systems in simulation with and without control constraints. We
conclude with a discussion on future directions and their implications to
robotics
Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning
In this work we apply the Deep Galerkin Method (DGM) described in Sirignano
and Spiliopoulos (2018) to solve a number of partial differential equations
that arise in quantitative finance applications including option pricing,
optimal execution, mean field games, etc. The main idea behind DGM is to
represent the unknown function of interest using a deep neural network. A key
feature of this approach is the fact that, unlike other commonly used numerical
approaches such as finite difference methods, it is mesh-free. As such, it does
not suffer (as much as other numerical methods) from the curse of
dimensionality associated with highdimensional PDEs and PDE systems. The main
goals of this paper are to elucidate the features, capabilities and limitations
of DGM by analyzing aspects of its implementation for a number of different
PDEs and PDE systems. Additionally, we present: (1) a brief overview of PDEs in
quantitative finance along with numerical methods for solving them; (2) a brief
overview of deep learning and, in particular, the notion of neural networks;
(3) a discussion of the theoretical foundations of DGM with a focus on the
justification of why this method is expected to perform well
Gradient Dynamic Approach to the Tensor Complementarity Problem
Nonlinear gradient dynamic approach for solving the tensor complementarity
problem (TCP) is presented. Theoretical analysis shows that each of the defined
dynamical system models ensures the convergence performance. The computer
simulation results further substantiate that the considered dynamical system
can solve the tensor complementarity problem (TCP).Comment: 18pages. arXiv admin note: text overlap with arXiv:1804.00406 by
other author
Solving parametric PDE problems with artificial neural networks
The curse of dimensionality is commonly encountered in numerical partial
differential equations (PDE), especially when uncertainties have to be modeled
into the equations as random coefficients. However, very often the variability
of physical quantities derived from a PDE can be captured by a few features on
the space of the coefficient fields. Based on such an observation, we propose
using a neural-network (NN) based method to parameterize the physical quantity
of interest as a function of input coefficients. The representability of such
quantity using a neural-network can be justified by viewing the neural-network
as performing time evolution to find the solutions to the PDE. We further
demonstrate the simplicity and accuracy of the approach through notable
examples of PDEs in engineering and physics.Comment: 17 pages, 4 figures, 2 table
Solving high-dimensional partial differential equations using deep learning
Developing algorithms for solving high-dimensional partial differential
equations (PDEs) has been an exceedingly difficult task for a long time, due to
the notoriously difficult problem known as the "curse of dimensionality". This
paper introduces a deep learning-based approach that can handle general
high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using
backward stochastic differential equations and the gradient of the unknown
solution is approximated by neural networks, very much in the spirit of deep
reinforcement learning with the gradient acting as the policy function.
Numerical results on examples including the nonlinear Black-Scholes equation,
the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that
the proposed algorithm is quite effective in high dimensions, in terms of both
accuracy and cost. This opens up new possibilities in economics, finance,
operational research, and physics, by considering all participating agents,
assets, resources, or particles together at the same time, instead of making ad
hoc assumptions on their inter-relationships.Comment: 13 pages, 6 figure
Task-based End-to-end Model Learning in Stochastic Optimization
With the increasing popularity of machine learning techniques, it has become
common to see prediction algorithms operating within some larger process.
However, the criteria by which we train these algorithms often differ from the
ultimate criteria on which we evaluate them. This paper proposes an end-to-end
approach for learning probabilistic machine learning models in a manner that
directly captures the ultimate task-based objective for which they will be
used, within the context of stochastic programming. We present three
experimental evaluations of the proposed approach: a classical inventory stock
problem, a real-world electrical grid scheduling task, and a real-world energy
storage arbitrage task. We show that the proposed approach can outperform both
traditional modeling and purely black-box policy optimization approaches in
these applications.Comment: In NIPS 2017. Code available at
https://github.com/locuslab/e2e-model-learnin
Neural networks catching up with finite differences in solving partial differential equations in higher dimensions
Fully connected multilayer perceptrons are used for obtaining numerical
solutions of partial differential equations in various dimensions. Independent
variables are fed into the input layer, and the output is considered as
solution's value. To train such a network one can use square of equation's
residual as a cost function and minimize it with respect to weights by gradient
descent. Following previously developed method, derivatives of the equation's
residual along random directions in space of independent variables are also
added to cost function. Similar procedure is known to produce nearly machine
precision results using less than 8 grid points per dimension for 2D case. The
same effect is observed here for higher dimensions: solutions are obtained on
low density grids, but maintain their precision in the entire region. Boundary
value problems for linear and nonlinear Poisson equations are solved inside 2,
3, 4, and 5 dimensional balls. Grids for linear cases have 40, 159, 512 and
1536 points and for nonlinear 64, 350, 1536 and 6528 points respectively. In
all cases maximum error is less than , and median error is
less than . Very weak grid requirements enable neural networks
to obtain solution of 5D linear problem within 22 minutes, whereas projected
solving time for finite differences on the same hardware is 50 minutes. Method
is applied to second order equation, but requires little to none modifications
to solve systems or higher order PDEs
Stable Architectures for Deep Neural Networks
Deep neural networks have become invaluable tools for supervised machine
learning, e.g., classification of text or images. While often offering superior
results over traditional techniques and successfully expressing complicated
patterns in data, deep architectures are known to be challenging to design and
train such that they generalize well to new data. Important issues with deep
architectures are numerical instabilities in derivative-based learning
algorithms commonly called exploding or vanishing gradients. In this paper we
propose new forward propagation techniques inspired by systems of Ordinary
Differential Equations (ODE) that overcome this challenge and lead to
well-posed learning problems for arbitrarily deep networks.
The backbone of our approach is our interpretation of deep learning as a
parameter estimation problem of nonlinear dynamical systems. Given this
formulation, we analyze stability and well-posedness of deep learning and use
this new understanding to develop new network architectures. We relate the
exploding and vanishing gradient phenomenon to the stability of the discrete
ODE and present several strategies for stabilizing deep learning for very deep
networks. While our new architectures restrict the solution space, several
numerical experiments show their competitiveness with state-of-the-art
networks.Comment: 23 pages, 7 figure
- …