Search CORE

33,728 research outputs found

Gradient-Free Learning Based on the Kernel and the Range Space

Author: Li Zhengguo
Lin Zhiping
Oh Beomseok
Sun Lei
Toh Kar-Ann
Publication venue
Publication date: 26/10/2018
Field of study

In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free learning search when the system can be expressed in the form of a linear matrix equation. When the nonlinear activation function is invertible, the learning problem of a fully-connected multilayer feedforward neural network can be easily adapted for this novel learning framework. By a series of kernel and range space manipulations, it turns out that such a network learning boils down to solving a set of cross-coupling equations. By having the weights randomly initialized, the equations can be decoupled and the network solution shows relatively good learning capability for real world data sets of small to moderate dimensions. Based on the structural information of the matrix equation, the network representation is found to be dependent on the number of data samples and the output dimension.Comment: The idea of kernel and range projection was first introduced in the IEEE/ACIS ICIS conference which was held in Singapore in June 2018. This article presents a full development of the method supported by extensive numerical result

arXiv.org e-Print Archive

The loss surface of deep linear networks viewed through the algebraic geometry lens

Author: Chen Tianran
Hauenstein Jonathan D.
Mehta Dhagash
Tang Tingting
Publication venue
Publication date: 17/10/2018
Field of study

By using the viewpoint of modern computational algebraic geometry, we explore properties of the optimization landscapes of the deep linear neural network models. After clarifying on the various definitions of "flat" minima, we show that the geometrically flat minima, which are merely artifacts of residual continuous symmetries of the deep linear networks, can be straightforwardly removed by a generalized

L_2

regularization. Then, we establish upper bounds on the number of isolated stationary points of these networks with the help of algebraic geometry. Using these upper bounds and utilizing a numerical algebraic geometry method, we find all stationary points of modest depth and matrix size. We show that in the presence of the non-zero regularization, deep linear networks indeed possess local minima which are not the global minima. Our computational results clarify certain aspects of the loss surfaces of deep linear networks and provide novel insights.Comment: 16 pages (2-columns), 5 figure

arXiv.org e-Print Archive

Learning Deep Stochastic Optimal Control Policies using Forward-Backward SDEs

Author: Exarchos Ioannis
Pereira Marcus
Theodorou Evangelos A.
Wang Ziyi
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 04/03/2021
Field of study

In this paper we propose a new methodology for decision-making under uncertainty using recent advancements in the areas of nonlinear stochastic optimal control theory, applied mathematics, and machine learning. Grounded on the fundamental relation between certain nonlinear partial differential equations and forward-backward stochastic differential equations, we develop a control framework that is scalable and applicable to general classes of stochastic systems and decision-making problem formulations in robotics and autonomy. The proposed deep neural network architectures for stochastic control consist of recurrent and fully connected layers. The performance and scalability of the aforementioned algorithm are investigated in three non-linear systems in simulation with and without control constraints. We conclude with a discussion on future directions and their implications to robotics

arXiv.org e-Print Archive

Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning

Author: Al-Aradi Ali
Correia Adolfo
Jardim Gabriel
Naiff Danilo
Saporito Yuri
Publication venue
Publication date: 21/11/2018
Field of study

In this work we apply the Deep Galerkin Method (DGM) described in Sirignano and Spiliopoulos (2018) to solve a number of partial differential equations that arise in quantitative finance applications including option pricing, optimal execution, mean field games, etc. The main idea behind DGM is to represent the unknown function of interest using a deep neural network. A key feature of this approach is the fact that, unlike other commonly used numerical approaches such as finite difference methods, it is mesh-free. As such, it does not suffer (as much as other numerical methods) from the curse of dimensionality associated with highdimensional PDEs and PDE systems. The main goals of this paper are to elucidate the features, capabilities and limitations of DGM by analyzing aspects of its implementation for a number of different PDEs and PDE systems. Additionally, we present: (1) a brief overview of PDEs in quantitative finance along with numerical methods for solving them; (2) a brief overview of deep learning and, in particular, the notion of neural networks; (3) a discussion of the theoretical foundations of DGM with a focus on the justification of why this method is expected to perform well

arXiv.org e-Print Archive

Gradient Dynamic Approach to the Tensor Complementarity Problem

Author: Che Maolin
Qi Liqun
Wang Xuezhong
Wei Yimin
Publication venue
Publication date: 10/08/2018
Field of study

Nonlinear gradient dynamic approach for solving the tensor complementarity problem (TCP) is presented. Theoretical analysis shows that each of the defined dynamical system models ensures the convergence performance. The computer simulation results further substantiate that the considered dynamical system can solve the tensor complementarity problem (TCP).Comment: 18pages. arXiv admin note: text overlap with arXiv:1804.00406 by other author

arXiv.org e-Print Archive

Solving parametric PDE problems with artificial neural networks

Author: Khoo Yuehaw
Lu Jianfeng
Ying Lexing
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 22/05/2018
Field of study

The curse of dimensionality is commonly encountered in numerical partial differential equations (PDE), especially when uncertainties have to be modeled into the equations as random coefficients. However, very often the variability of physical quantities derived from a PDE can be captured by a few features on the space of the coefficient fields. Based on such an observation, we propose using a neural-network (NN) based method to parameterize the physical quantity of interest as a function of input coefficients. The representability of such quantity using a neural-network can be justified by viewing the neural-network as performing time evolution to find the solutions to the PDE. We further demonstrate the simplicity and accuracy of the approach through notable examples of PDEs in engineering and physics.Comment: 17 pages, 4 figures, 2 table

arXiv.org e-Print Archive

Solving high-dimensional partial differential equations using deep learning

Author: E Weinan
Han Jiequn
Jentzen Arnulf
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 03/07/2018
Field of study

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality". This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost. This opens up new possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their inter-relationships.Comment: 13 pages, 6 figure

arXiv.org e-Print Archive

Task-based End-to-end Model Learning in Stochastic Optimization

Author: Amos Brandon
Donti Priya L.
Kolter J. Zico
Publication venue
Publication date: 25/04/2019
Field of study

With the increasing popularity of machine learning techniques, it has become common to see prediction algorithms operating within some larger process. However, the criteria by which we train these algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models in a manner that directly captures the ultimate task-based objective for which they will be used, within the context of stochastic programming. We present three experimental evaluations of the proposed approach: a classical inventory stock problem, a real-world electrical grid scheduling task, and a real-world energy storage arbitrage task. We show that the proposed approach can outperform both traditional modeling and purely black-box policy optimization approaches in these applications.Comment: In NIPS 2017. Code available at https://github.com/locuslab/e2e-model-learnin

arXiv.org e-Print Archive

Neural networks catching up with finite differences in solving partial differential equations in higher dimensions

Author: Avrutskiy V. I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2017
Field of study

Fully connected multilayer perceptrons are used for obtaining numerical solutions of partial differential equations in various dimensions. Independent variables are fed into the input layer, and the output is considered as solution's value. To train such a network one can use square of equation's residual as a cost function and minimize it with respect to weights by gradient descent. Following previously developed method, derivatives of the equation's residual along random directions in space of independent variables are also added to cost function. Similar procedure is known to produce nearly machine precision results using less than 8 grid points per dimension for 2D case. The same effect is observed here for higher dimensions: solutions are obtained on low density grids, but maintain their precision in the entire region. Boundary value problems for linear and nonlinear Poisson equations are solved inside 2, 3, 4, and 5 dimensional balls. Grids for linear cases have 40, 159, 512 and 1536 points and for nonlinear 64, 350, 1536 and 6528 points respectively. In all cases maximum error is less than

8.8\cdot10^{-6}

, and median error is less than

2.4\cdot10^{-6}

. Very weak grid requirements enable neural networks to obtain solution of 5D linear problem within 22 minutes, whereas projected solving time for finite differences on the same hardware is 50 minutes. Method is applied to second order equation, but requires little to none modifications to solve systems or higher order PDEs

arXiv.org e-Print Archive

Stable Architectures for Deep Neural Networks

Author: Haber Eldad
Ruthotto Lars
Publication venue: 'IOP Publishing'
Publication date: 16/02/2019
Field of study

Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Important issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.Comment: 23 pages, 7 figure

arXiv.org e-Print Archive