Search CORE

34,839 research outputs found

Numerically Approximating Parabolic PDEs using Deep Learning

Author: Sanders Julia
Publication venue: Helsingfors universitet
Publication date: 01/01/2022
Field of study

In this thesis, we demonstrate the use of machine learning in numerically solving both linear and non-linear parabolic partial differential equations. By using deep learning, rather than more traditional, established numerical methods (for example, Monte Carlo sampling) to calculate numeric solutions to such problems, we can tackle even very high dimensional problems, potentially overcoming the curse of dimensionality. This happens when the computational complexity of a problem grows exponentially with the number of dimensions. In Chapter 1, we describe the derivation of the computational problem needed to apply the deep learning method in the case of the linear Kolmogorov PDE. We start with an introduction to a few core concepts in Stochastic Analysis, particularly Stochastic Differential Equations, and define the Kolmogorov Backward Equation. We describe how the Feynman-Kac theorem means that the solution to the linear Kolmogorov PDE is a conditional expectation, and therefore how we can turn the numerical approximation of solving such a PDE into a minimisation. Chapter 2 discusses the key ideas behind the terminology deep learning; specifically, what a neural network is and how we can apply this to solve the minimisation problem from Chapter 1. We describe the key features of a neural network, the training process, and how parameters can be learned through a gradient descent based optimisation. We summarise the numerical method in Algorithm 1. In Chapter 3, we implement a neural network and train it to solve a 100-dimensional linear Black-Scholes PDE with underlying geometric Brownian motion, and similarly with correlated Brownian motion. We also illustrate an example with a non-linear auxiliary Itô process: the Stochastic Lorenz Equation. We additionally compute a solution to the geometric Brownian motion problem in 1 dimensions, and compare the accuracy of the solution found by the neural network and that found by two other numerical methods: Monte Carlo sampling and finite differences, as well as the solution found using the implicit formula for the solution. For 2-dimensions, the solution of the geometric Brownian motion problem is compared against a solution obtained by Monte Carlo sampling, which shows that the neural network approximation falls within the 99\% confidence interval of the Monte Carlo estimate. We also investigate the impact of the frequency of re-sampling training data and the batch size on the rate of convergence of the neural network. Chapter 4 describes the derivation of the equivalent minimisation problem for solving a Kolmogorov PDE with non-linear coefficients, where we discretise the PDE in time, and derive an approximate Feynman-Kac representation on each time step. Chapter 5 demonstrates the method on an example of a non-linear Black-Scholes PDE and a Hamilton-Jacobi-Bellman equation. The numerical examples are based on the code by Beck et al. in their papers "Solving the Kolmogorov PDE by means of deep learning" and "Deep splitting method for parabolic PDEs", and are written in the Julia programming language, with use of the Flux library for Machine Learning in Julia. The code used to implement the method can be found at https://github.com/julia-sand/pde_appro

Helsingin yliopiston digitaalinen arkisto

Do deep neural networks have an inbuilt Occam's razor?

Author: Louis Ard A.
Mingard Chris
Rees Henry
Valle-Pérez Guillermo
Publication venue
Publication date: 13/04/2023
Field of study

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs

arXiv.org e-Print Archive

Applying MDL to Learning Best Model Granularity

Author: Gao Qiong
Li Ming
Vitanyi Paul
Publication venue
Publication date: 01/01/2000
Field of study

The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies ``Occam's Razor.'' In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.Comment: LaTeX, 32 pages, 5 figures. Artificial Intelligence journal, To appea

arXiv.org e-Print Archive

Elsevier - Publisher Connector

CWI's Institutional Repository

CERN Document Server

International Migration, Integration and Social Cohesion online publications

Quantum Generative Adversarial Networks for Learning and Loading Random Distributions

Author: Lucchi Aurélien
Woerner Stefan
Zoufal Christa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Quantum algorithms have the potential to outperform their classical counterparts in a variety of tasks. The realization of the advantage often requires the ability to load classical data efficiently into quantum states. However, the best known methods require

\mathcal{O}\left(2^n\right)

gates to load an exact representation of a generic data structure into an

n

-qubit state. This scaling can easily predominate the complexity of a quantum algorithm and, thereby, impair potential quantum advantage. Our work presents a hybrid quantum-classical algorithm for efficient, approximate quantum state loading. More precisely, we use quantum Generative Adversarial Networks (qGANs) to facilitate efficient learning and loading of generic probability distributions -- implicitly given by data samples -- into quantum states. Through the interplay of a quantum channel, such as a variational quantum circuit, and a classical neural network, the qGAN can learn a representation of the probability distribution underlying the data samples and load it into a quantum state. The loading requires

\mathcal{O}\left(poly\left(n\right)\right)

gates and can, thus, enable the use of potentially advantageous quantum algorithms, such as Quantum Amplitude Estimation. We implement the qGAN distribution learning and loading method with Qiskit and test it using a quantum simulation as well as actual quantum processors provided by the IBM Q Experience. Furthermore, we employ quantum simulation to demonstrate the use of the trained quantum channel in a quantum finance application.Comment: 14 pages, 13 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

CoCalc as a Learning Tool for Neural Network Simulation in the Special Course "Foundations of Mathematic Informatics"

Author: Markova Oksana
Popel Maiia
Semerikov Serhiy
Publication venue
Publication date: 02/07/2018
Field of study

The role of neural network modeling in the learning content of the special course "Foundations of Mathematical Informatics" was discussed. The course was developed for the students of technical universities - future IT-specialists and directed to breaking the gap between theoretic computer science and it's applied applications: software, system and computing engineering. CoCalc was justified as a learning tool of mathematical informatics in general and neural network modeling in particular. The elements of technique of using CoCalc at studying topic "Neural network and pattern recognition" of the special course "Foundations of Mathematic Informatics" are shown. The program code was presented in a CoffeeScript language, which implements the basic components of artificial neural network: neurons, synaptic connections, functions of activations (tangential, sigmoid, stepped) and their derivatives, methods of calculating the network's weights, etc. The features of the Kolmogorov-Arnold representation theorem application were discussed for determination the architecture of multilayer neural networks. The implementation of the disjunctive logical element and approximation of an arbitrary function using a three-layer neural network were given as an examples. According to the simulation results, a conclusion was made as for the limits of the use of constructed networks, in which they retain their adequacy. The framework topics of individual research of the artificial neural networks is proposed.Comment: 16 pages, 3 figures, Proceedings of the 13th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer (ICTERI, 2018

arXiv.org e-Print Archive

Directory of Open Access Journals