Search CORE

816 research outputs found

Introduction to neural ordinary differential equations

Author: Lisa Bou Adria
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/10/2022
Field of study

This thesis aims to provide a comprehensive overview of Neural Networks (NN) and Neural Ordinary Differential Equations (NODEs) from a mathematical standpoint and insights into their training methods and approximation capabilities. The first chapter covers the basics of NNs, including the mathematics of gradient descent and Universal Approximation theorems (UA), as well as an introduction to residual NNs. The second chapter dives into the world of NODEs, which can be thought of as continuous idealisations of residual NNs. Then we explore some UA theorems for NODEs and examine three different training methods: direct backpropagation, continuous adjoint, and the adaptative checkpoint adjoint method (ACA). Additionally, we explore some applications of NODEs, such as image classification, and provide a PyTorch code example that trains a NODE to approximate the trajectories of a Lorenz System using ACA

UPCommons. Portal del coneixement obert de la UPC

Neural Injective Functions for Multisets, Measures and Graphs via a Finite Witness Theorem

Author: Amir Tal
Avni Ilai
Dym Nadav
Gortler Steven J.
Ravina Ravina
Publication venue
Publication date: 29/10/2023
Field of study

Injective multiset functions have a key role in the theoretical study of machine learning on multisets and graphs. Yet, there remains a gap between the provably injective multiset functions considered in theory, which typically rely on polynomial moments, and the multiset functions used in practice, which rely on

\textit{neural moments}

\unicode{x2014} whose injectivity on multisets has not been studied to date. In this paper, we bridge this gap by showing that moments of neural networks do define injective multiset functions, provided that an analytic non-polynomial activation is used. The number of moments required by our theory is optimal essentially up to a multiplicative factor of two. To prove this result, we state and prove a

\textit{finite witness theorem}

, which is of independent interest. As a corollary to our main theorem, we derive new approximation results for functions on multisets and measures, and new separation results for graph neural networks. We also provide two negative results: (1) moments of piecewise-linear neural networks cannot be injective multiset functions; and (2) even when moment-based multiset functions are injective, they can never be bi-Lipschitz.Comment: NeurIPS 2023 camera-read

arXiv.org e-Print Archive

Universal Approximation Theorem and error bounds for quantum neural networks and quantum reservoirs

Author: Gonon Lukas
Jacquier Antoine
Publication venue
Publication date: 24/07/2023
Field of study

Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterised quantum circuits. We provide here precise error bounds for specific classes of functions and extend these results to the interesting new setup of randomised quantum circuits, mimicking classical reservoir neural networks. Our results show in particular that a quantum neural network with

\mathcal{O}(\varepsilon^{-2})

weights and

\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)

qubits suffices to achieve accuracy

\varepsilon>0

when approximating functions with integrable Fourier transform.Comment: 20 pages, 0 figur

arXiv.org e-Print Archive

Universal Approximation Property of Random Neural Networks

Author: Neufeld Ariel
Schmocker Philipp
Publication venue
Publication date: 20/12/2023
Field of study

In this paper, we study random neural networks which are single-hidden-layer feedforward neural networks whose weights and biases are randomly initialized. After this random initialization, only the linear readout needs to be trained, which can be performed efficiently, e.g., by the least squares method. By viewing random neural networks as Banach space-valued random variables, we prove a universal approximation theorem within a large class of Bochner spaces. Hereby, the corresponding Banach space can be significantly more general than the space of continuous functions over a compact subset of a Euclidean space, namely, e.g., an

L^p

-space or a Sobolev space, where the latter includes the approximation of the derivatives. Moreover, we derive approximation rates and an explicit algorithm to learn a deterministic function by a random neural network. In addition, we provide a full error analysis and study when random neural networks overcome the curse of dimensionality in the sense that the training costs scale at most polynomially in the input and output dimension. Furthermore, we show in two numerical examples the empirical advantages of random neural networks compared to fully trained deterministic neural networks.Comment: 64 pages, 3 figure

arXiv.org e-Print Archive