116 research outputs found

    When Deep Learning Meets Polyhedral Theory: A Survey

    Full text link
    In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure \unicode{x2014}such as the typical fully-connected feedforward neural network\unicode{x2014} amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks

    Get PDF
    The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks

    Deep Learning for Inverse Problems: Performance Characterizations, Learning Algorithms, and Applications

    Get PDF
    Deep learning models have witnessed immense empirical success over the last decade. However, in spite of their widespread adoption, a profound understanding of the generalization behaviour of these over-parameterized architectures is still missing. In this thesis, we provide one such way via a data-dependent characterizations of the generalization capability of deep neural networks based data representations. In particular, by building on the algorithmic robustness framework, we offer a generalisation error bound that encapsulates key ingredients associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the Lipschitz properties of a deep neural network. We then specialize our analysis to a specific class of model based regression problems, namely the inverse problems. These problems often come with well defined forward operators that map variables of interest to the observations. It is therefore natural to ask whether such knowledge of the forward operator can be exploited in deep learning approaches increasingly used to solve inverse problems. We offer a generalisation error bound that -- apart from the other factors -- depends on the Jacobian of the composition of the forward operator with the neural network. Motivated by our analysis, we then propose a `plug-and-play' regulariser that leverages the knowledge of the forward map to improve the generalization of the network. We likewise also provide a method allowing us to tightly upper bound the norms of the Jacobians of the relevant operators that is much more {computationally} efficient than existing ones. We demonstrate the efficacy of our model-aware regularised deep learning algorithms against other state-of-the-art approaches on inverse problems involving various sub-sampling operators such as those used in classical compressed sensing setup and inverse problems that are of interest in the biomedical imaging setup

    Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

    Get PDF
    The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.Comment: Proceedings of the 37th International Conference on Machine Learning (ICML
    • …
    corecore