11 research outputs found

    Probabilistic Best Subset Selection via Gradient-Based Optimization

    Full text link
    In high-dimensional statistics, variable selection is an optimization problem aiming to recover the latent sparse pattern from all possible covariate combinations. In this paper, we propose a novel optimization method to solve the exact L0L_0-regularized regression problem (a.k.a. best subset selection). We reformulate the optimization problem from a discrete space to a continuous one via probabilistic reparameterization. Within the framework of stochastic gradient descent, we propose a family of unbiased gradient estimators to optimize the L0L_0-regularized objective and a variational lower bound. Within this family, we identify the estimator with a non-vanishing signal-to-noise ratio and uniformly minimum variance. Theoretically, we study the general conditions under which the method is guaranteed to converge to the ground truth in expectation. In a wide variety of synthetic and semi-synthetic data sets, the proposed method outperforms existing variable selection methods that are based on penalized regression and mixed-integer optimization, in both sparse pattern recovery and out-of-sample prediction. Our method can find the true regression model from thousands of covariates in a couple of seconds.

    What to Do When Your Discrete Optimization Is the Size of a Neural Network?

    Full text link
    Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks. Still, these discrete problems are combinatorial in nature and are also not amenable to gradient-based optimization. Additionally, classical approaches used in discrete settings do not scale well to large neural networks, forcing scientists and empiricists to rely on alternative methods. Among these, two main distinct sources of top-down information can be used to lead the model to good solutions: (1) extrapolating gradient information from points outside of the solution set (2) comparing evaluations between members of a subset of the valid solutions. We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter, while also noting that some hybrid methods combine the two. The main goal of this work is to compare both approaches. For that purpose, we first overview the two classes while also discussing some of their drawbacks analytically. Then, on the experimental section, we compare their performance, starting with smaller microworld experiments, which allow more fine-grained control of problem variables, and gradually moving towards larger problems, including neural network regression and neural network pruning for image classification, where we additionally compare against magnitude-based pruning.Comment: Submitted to JML

    Compositionality, stability and robustness in probabilistic machine learning

    Get PDF
    Probability theory plays an integral part in the field of machine learning. Its use has been advocated by many [MacKay, 2002; Jaynes, 2003] as it allows for the quantification of uncertainty and the incorporation of prior knowledge by simply applying the rules of probability [Kolmogorov, 1950]. While probabilistic machine learning has been originally restricted to simple models, the advent of new computational technologies, such as automatic differentiation, and advances in approximate inference, such as Variational Inference [Blei et al., 2017], has made it more viable in complex settings. Despite this progress, there remain many challenges to its application to real-world tasks. Among those are questions about the ability of probabilistic models to model complex tasks and their reliability both in training and in the face of unexpected data perturbation. These three issues can be addressed by examining the three properties of compositionality, stability and robustness in these models. Hence, this thesis explores these three key properties and their application to probabilistic models, while validating their importance on a range of applications. The first contribution in this thesis studies compositionality. Compositionality enables the construction of complex and expressive probabilistic models from simple components. This increases the types of phenomena that one can model and provides the modeller with a wide array of modelling options. This thesis examines this property through the lens of Gaussian processes [Rasmussen and Williams, 2006]. It proposes a generic compositional Gaussian process model to address the problem of multi-task learning in the non-linear setting. Additionally, this thesis contributes two methods addressing the issue of stability. Stability determines the reliability of inference algorithms in the presence of noise. More stable training procedures lead to faster, more reliable inferences, especially for complex models. The two proposed methods aim at stabilising stochastic gradient estimation in Variational Inference using the method of control variates [Owen, 2013]. Finally, the last contribution of this thesis considers robustness. Robust machine learning methods are unaffected by unaccounted-for phenomena in the data. This makes such methods essential in deploying machine learning on real-world datasets. This thesis examines the problem of robust inference in sequential probabilistic models by combining the ideas of Generalised Bayesian Inference [Bissiri et al., 2016] and Sequential Monte Carlo sampling [Doucet and Johansen, 2011]

    University of Wollongong Undergraduate Calendar 1999

    Get PDF

    Proceedings of ICMMB2014

    Get PDF
    corecore