12 research outputs found

    Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

    Full text link
    This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-{\L}ojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.Comment: 601 pages, 36 figures, 45 source code

    Complete Stability of Neural Networks With Extended Memristors

    Get PDF
    The article considers a large class of delayed neural networks (NNs) with extended memristors obeying the Stanford model. This is a widely used and popular model that accurately describes the switching dynamics of real nonvolatile memristor devices implemented in nanotechnology. The article studies via the Lyapunov method complete stability (CS), i.e., convergence of trajectories in the presence of multiple equilibrium points (EPs), for delayed NNs with Stanford memristors. The obtained conditions for CS are robust with respect to variations of the interconnections and they hold for any value of the concentrated delay. Moreover, they can be checked either numerically, via a linear matrix inequality (LMI), or analytically, via the concept of Lyapunov diagonally stable (LDS) matrices. The conditions ensure that at the end of the transient capacitor voltages and NN power vanish. In turn, this leads to advantages in terms of power consumption. This notwithstanding, the nonvolatile memristors can retain the result of computation in accordance with the in-memory computing principle. The results are verified and illustrated via numerical simulations. From a methodological viewpoint, the article faces new challenges to prove CS since due to the presence of nonvolatile memristors the NNs possess a continuum of nonisolated EPs. Also, for physical reasons, the memristor state variables are constrained to lie in some given intervals so that the dynamics of the NNs need to be modeled via a class of differential inclusions named differential variational inequalities

    Inertial and Second-order Optimization Algorithms for Training Neural Networks

    Get PDF
    Neural network models became highly popular during the last decade due to their efficiency in various applications. These are very large parametric models whose parameters must be set for each specific task. This crucial process of choosing the parameters, known as training, is done using large datasets. Due to the large amount of data and the size of the neural networks, the training phase is very expensive in terms of computational time and resources. From a mathematical point of view, training a neural network means solving a large-scale optimization problem. More specifically it involves the minimization of a sum of functions. The large-scale nature of the optimization problem highly restrains the types of algorithms available to minimize this sum of functions. In this context, standard algorithms almost exclusively rely on inexact gradients through the backpropagation method and mini-batch sub-sampling. As a result, firstorder methods such as stochastic gradient descent (SGD) remain the most used ones to train neural networks. Additionally, the function to minimize is non-convex and possibly nondifferentiable, resulting in limited convergence guarantees for these methods. In this thesis, we focus on building new algorithms exploiting second-order information only by means of noisy firstorder automatic differentiation. Starting from a dynamical system (an ordinary differential equation), we build INNA, an inertial and Newtonian algorithm. By analyzing together the dynamical system and INNA, we prove the convergence of the algorithm to the critical points of the function to minimize. Then, we show that the limit is actually a local minimum with overwhelming probability. Finally, we introduce Step-Tuned SGD that automatically adjusts the step-sizes of SGD. It does so by cleverly modifying the mini-batch sub-sampling allowing for an efficient discretization of second-order information. We prove the almost sure convergence of Step-Tuned SGD to critical points and provide rates of convergence. All the theoretical results are backed by promising numerical experiments on deep learning problems

    Momentum-Net: Fast and convergent iterative neural network for inverse problems

    Full text link
    Iterative neural networks (INN) are rapidly gaining attention for solving inverse problems in imaging, image processing, and computer vision. INNs combine regression NNs and an iterative model-based image reconstruction (MBIR) algorithm, often leading to both good generalization capability and outperforming reconstruction quality over existing MBIR optimization models. This paper proposes the first fast and convergent INN architecture, Momentum-Net, by generalizing a block-wise MBIR algorithm that uses momentum and majorizers with regression NNs. For fast MBIR, Momentum-Net uses momentum terms in extrapolation modules, and noniterative MBIR modules at each iteration by using majorizers, where each iteration of Momentum-Net consists of three core modules: image refining, extrapolation, and MBIR. Momentum-Net guarantees convergence to a fixed-point for general differentiable (non)convex MBIR functions (or data-fit terms) and convex feasible sets, under two asymptomatic conditions. To consider data-fit variations across training and testing samples, we also propose a regularization parameter selection scheme based on the "spectral spread" of majorization matrices. Numerical experiments for light-field photography using a focal stack and sparse-view computational tomography demonstrate that, given identical regression NN architectures, Momentum-Net significantly improves MBIR speed and accuracy over several existing INNs; it significantly improves reconstruction quality compared to a state-of-the-art MBIR method in each application.Comment: 28 pages, 13 figures, 3 algorithms, 4 tables, submitted revision to IEEE T-PAM

    Adaptiveness, Asynchrony, and Resource Efficiency in Parallel Stochastic Gradient Descent

    Get PDF
    Accelerated digitalization and sensor deployment in society in recent years poses critical challenges for associated data processing and analysis infrastructure to scale, and the field of big data, targeting methods for storing, processing, and revealing patterns in huge data sets, has surged. Artificial Intelligence (AI) models are used diligently in standard Big Data pipelines due to their tremendous success across various data analysis tasks, however exponential growth in Volume, Variety and Velocity of Big Data (known as its three V’s) in recent years require associated complexity in the AI models that analyze it, as well as the Machine Learning (ML) processes required to train them. In order to cope, parallelism in ML is standard nowadays, with the aim to better utilize contemporary computing infrastructure, whether it being shared-memory multi-core CPUs, or vast connected networks of IoT devices engaging in Federated Learning (FL).Stochastic Gradient Descent (SGD) serves as the backbone of many of the most popular ML methods, including in particular Deep Learning. However, SGD has inherently sequential semantics, and is not trivially parallelizable without imposing strict synchronization, with associated bottlenecks. Asynchronous SGD (AsyncSGD), which relaxes the original semantics, has gained significant interest in recent years due to promising results that show speedup in certain contexts. However, the relaxed semantics that asynchrony entails give rise to fundamental questions regarding AsyncSGD, relating particularly to its stability and convergence rate in practical applications.This thesis explores vital knowledge gaps of AsyncSGD, and contributes in particular to: Theoretical frameworks – Formalization of several key notions related to the impact of asynchrony on the convergence, guiding future development of AsyncSGD implementations; Analytical results – Asymptotic convergence bounds under realistic assumptions. Moreover, several technical solutions are proposed, targeting in particular: Stability – Reducing the number of non-converging executions and the associated wasted energy; Speedup – Improving convergence time and reliability with instance-based adaptiveness; Elasticity – Resource-efficiency by avoiding over-parallelism, and thereby improving stability and saving computing resources. The proposed methods are evaluated on several standard DL benchmarking applications and compared to relevant baselines from previous literature. Key results include: (i) persistent speedup compared to baselines, (ii) increased stability and reduced risk for non-converging executions, (iii) reduction in the overall memory footprint (up to 17%), as well as the consumed computing resources (up to 67%).In addition, along with this thesis, an open-source implementation is published, that connects high-level ML operations with asynchronous implementations with fine-grained memory operations, leveraging future research for efficient adaptation of AsyncSGD for practical applications

    Asymptotics of stochastic learning in structured networks

    Get PDF

    Asymptotics of stochastic learning in structured networks

    Get PDF

    Fast optimization methods for machine learning, and game-theoretic models of cultural evolution

    Get PDF
    This thesis has two parts. In the first part, we explore fast stochastic optimization methods for machine learning. Mathematical optimization is a backbone of modern machine learning. Most machine learning problems require optimizing some objective function that measures how well a model matches a data set, with the intention of drawing patterns and making decisions on new unseen data. The success of optimization algorithms in solving these problems is critical to the success of machine learning, and has enabled the research community to explore more complex machine learning problems that require bigger models and larger datasets. Stochastic gradient descent (SGD) has become the standard optimization routine in machine learning, and in particular in deep neural networks, due to its impressive performance across a wide variety of tasks and models. SGD, however, can often be slow for neural networks with many layers and typically requires careful user oversight for setting hyperparameters properly. While innovations such as batch normalization and skip connections have helped alleviate some of these issues, why such innovations are required eludes full understanding, and it is worthwhile to gain deeper theoretical insights into these problems and to consider more advanced optimization methods specifically tailored towards training large complex models. In this part of the thesis, we review and analyze some of the recent progress made in this direction, and develop new optimization algorithms that are provably fast, significantly easier to train, and require less user oversight. Then, we will discuss the theory of quantized networks, which use low-precision weights to compress and accelerate neural networks, and when/why they are trainable. Finally, we discuss some recent results on how the convergence of SGD is affected by the architecture of neural nets, and we show using theoretical analysis that wide networks train faster than narrow nets, and deeper networks train slower than shallow nets - an effect often observed in practice. In the second part of the thesis, we study the evolution of cultural norms in human societies using game-theoretic models, drawing from research in cross-cultural psychology. Understanding human behavior and modeling how cultural norms evolve in different human societies is vital for designing policies and avoiding conflicts around the world. In this part, we explore ways to use computational game-theoretic techniques, and in particular evolutionary game-theoretic (EGT) models, to gain insight into why different human societies have different norms and behaviors. We first describe an evolutionary game-theoretic model to study how norms change in a society, based on the idea that different strength of norms in societies translate to different game-theoretic interaction structures and incentives. We identify conditions that determine when societies change their existing norms, when they are resistant to such change, and how this depends on the strength of norms in a society. Next, we extend this study to analyze the evolutionary relationships between the tendency to conform and how quickly a population reacts when conditions make a change in norm desirable. Our analysis identifies conditions when a tipping point is reached in a population, causing norms to change rapidly. Next we study conditions that affect the existence of group-biased behavior among humans (i.e., favoring others from the same group, and being hostile towards others from different groups). Using an evolutionary game-theoretic model, we show that out-group hostility is dramatically reduced by mobility. Technological and societal advances over the past centuries have greatly increased the degree to which humans change physical locations, and our results show that in highly mobile societies, one's choice of action is more likely to depend on what individual one is interacting with, rather than the group to which the individual belongs

    Control in moving interfaces and deep learning

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Matemáticas. Fecha de Lectura: 14-05-2021This thesis has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No.765579-ConFlex
    corecore