109 research outputs found

    Ensemble Kalman filter for neural network based one-shot inversion

    Full text link
    We study the use of novel techniques arising in machine learning for inverse problems. Our approach replaces the complex forward model by a neural network, which is trained simultaneously in a one-shot sense when estimating the unknown parameters from data, i.e. the neural network is trained only for the unknown parameter. By establishing a link to the Bayesian approach to inverse problems, an algorithmic framework is developed which ensures the feasibility of the parameter estimate w.r. to the forward model. We propose an efficient, derivative-free optimization method based on variants of the ensemble Kalman inversion. Numerical experiments show that the ensemble Kalman filter for neural network based one-shot inversion is a promising direction combining optimization and machine learning techniques for inverse problems

    Efficient Covariance Matrix Update for Variable Metric Evolution Strategies

    Get PDF
    International audienceRandomized direct search algorithms for continuous domains, such as Evolution Strategies, are basic tools in machine learning. They are especially needed when the gradient of an objective function (e.g., loss, energy, or reward function) cannot be computed or estimated efficiently. Application areas include supervised and reinforcement learning as well as model selection. These randomized search strategies often rely on normally distributed additive variations of candidate solutions. In order to efficiently search in non-separable and ill-conditioned landscapes the covariance matrix of the normal distribution must be adapted, amounting to a variable metric method. Consequently, Covariance Matrix Adaptation (CMA) is considered state-of-the-art in Evolution Strategies. In order to sample the normal distribution, the adapted covariance matrix needs to be decomposed, requiring in general Θ(n3)\Theta(n^3) operations, where nn is the search space dimension. We propose a new update mechanism which can replace a rank-one covariance matrix update and the computationally expensive decomposition of the covariance matrix. The newly developed update rule reduces the computational complexity of the rank-one covariance matrix adaptation to Θ(n2)\Theta(n^2) without resorting to outdated distributions. We derive new versions of the elitist Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and the multi-objective CMA-ES. These algorithms are equivalent to the original procedures except that the update step for the variable metric distribution scales better in the problem dimension. We also introduce a simplified variant of the non-elitist CMA-ES with the incremental covariance matrix update and investigate its performance. Apart from the reduced time-complexity of the distribution update, the algebraic computations involved in all new algorithms are simpler compared to the original versions. The new update rule improves the performance of the CMA-ES for large scale machine learning problems in which the objective function can be evaluated fast

    Convergence of sparse variational inference in gaussian processes regression

    Get PDF
    Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, N, due to the cubic (in N) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on M << N inducing variables to form an approximation at a cost of O(NM^2). While the computational cost appears linear in N, the true complexity depends on how M must scale with N to ensure a certain quality of the approximation. In this work, we investigate upper and lower bounds on how M needs to grow with N to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M<<N. Specifically, for the popular squared exponential kernel and D-dimensional Gaussian distributed covariates, M=O((log N)^D) suffice and a method with an overall computational cost of O(N(log N)^{2D}(\log\log N)^2) can be used to perform inference

    RandomBoost: Simplified Multi-class Boosting through Randomization

    Full text link
    We propose a novel boosting approach to multi-class classification problems, in which multiple classes are distinguished by a set of random projection matrices in essence. The approach uses random projections to alleviate the proliferation of binary classifiers typically required to perform multi-class classification. The result is a multi-class classifier with a single vector-valued parameter, irrespective of the number of classes involved. Two variants of this approach are proposed. The first method randomly projects the original data into new spaces, while the second method randomly projects the outputs of learned weak classifiers. These methods are not only conceptually simple but also effective and easy to implement. A series of experiments on synthetic, machine learning and visual recognition data sets demonstrate that our proposed methods compare favorably to existing multi-class boosting algorithms in terms of both the convergence rate and classification accuracy.Comment: 15 page

    Computational Bayesian Methods Applied to Complex Problems in Bio and Astro Statistics

    Get PDF
    In this dissertation we apply computational Bayesian methods to three distinct problems. In the first chapter, we address the issue of unrealistic covariance matrices used to estimate collision probabilities. We model covariance matrices with a Bayesian Normal-Inverse-Wishart model, which we fit with Gibbs sampling. In the second chapter, we are interested in determining the sample sizes necessary to achieve a particular interval width and establish non-inferiority in the analysis of prevalences using two fallible tests. To this end, we use a third order asymptotic approximation. In the third chapter, we wish to synthesize evidence across multiple domains in measurements taken longitudinally across time, featuring a substantial amount of structurally missing data, and fit the model with Hamiltonian Monte Carlo in a simulation to analyze how estimates of a parameter of interest change across sample sizes

    Advances in System Identification: Gaussian Regression and Robot Inverse Dynamics Learning

    Get PDF
    Nonparametric Gaussian regression models are powerful tools for supervised learning problems. Recently they have been introduced in the field of system identification as an alternative to classical parametric models used in prediction error methods. The focus of this thesis is the analysis and the extension of linear Gaussian regression models and their applications to the identification of the inverse dynamics of robotic platforms. When Gaussian processes are applied to linear systems identification, according to the Bayesian paradigm the impulse response is modeled a priori with a Gaussian distribution encoding the desired structural properties of the dynamical system (e.g. smoothness, BIBO stability, sparsity, etc.). The inference on the impulse response estimate is obtained through the posterior distribution which combines the information of the a priori distribution together with the information given by the data. The Bayesian framework naturally allows the adaptation of the model class and its complexity while also accounting for uncertainty and noise, thus providing a robust mean to trade bias versus variance. On the other hand, one disadvantage of these nonparametric methods is that their aim to identify directly the impulse response of the predictor model does not guarantee the stability of the forward model. These general advantages and disadvantages inspired the research on this manuscript. A COMPARISON BETWEEN GAUSSIAN REGRESSION AND PARAMETRIC PEM. The term of comparison for these Gaussian regression models will be the classical parametric technique. In addition to an analysis of the two approaches in terms of error in fitting the impulse response estimates, we are interested in comparing the confidence intervals around these estimates. A new definition of the confidence intervals is proposed in order to pave the way for a fair comparison between the two approaches. Numerical simulations show that the Bayesian estimates have higher prediction performance. ONLINE GAUSSIAN REGRESSION. In an on-line system identification setting, new data become available at given time steps and real-time estimation requirements have to be satisfied. The goal is to compute the model estimate with low and fixed computational complexity and a reduced memory storage. We developed a tailored Bayesian procedure which updates the quantities to compute the marginal likelihood and the impulse response estimate iteratively and performs the estimation of the hyperparameters by computing only one iteration of a suitable optimization algorithm to maximize the marginal likelihood. Both quasi-Newton methods and EM algorithm are adopted as optimization algorithms. When time-varying systems are considered, the property of ‘forgetting the past data’ is required. Accordingly we propose two schemes: the usage of a temporal window which slides over the data and the usage of a forgetting factor which is a variable that exponentially decreases the weight of the old data. In particular, we propose to consider the forgetting factor both as a fixed constant or as an estimating variable. The proposed nonparametric procedures have satisfactory performances compared to the batch algorithm and outperform the classical parametric approaches both in terms of computational time and adherence of the model estimate to the true one. ENFORCING MODEL STABILITY IN NONPARAMETRIC GAUSSIAN REGRESSION. The main idea of the Bayesian approach is to frame linear system identification as predictor estimation in an infinite dimensional space with the aid of regularization techniques. This approach is based on the prediction error minimization and can guarantee the identification of stable predictors. Unfortunately, the stability of the predictors does not guarantee the stability of the impulse response of the forward model in general. Various techniques are successfully proposed to guarantee the stability of this model. ONLINE SEMIPARAMETRIC LEARNING FOR INVERSE DYNAMICS MODELING. Dynamic models can be obtained from the first principles of mechanics, using the so called Rigid Body Dynamics. This approach results in a parametric model in which the values of physically meaningful parameters must be provided in order to complete the fixed structure of the model. Alternatively, the nonparametric Gaussian regression modeling can be employed extrapolating the dynamics directly from the experimental data, without making any unrealistic approximation on the physical system (e.g. assuming linear frictions models, ignoring the dynamics of the hydraulic actuators, etc.). Nevertheless, nonparametric models deteriorate their performance when predicting unseen data that are not in the ``neighbourhood'' of the training dataset. In order to exploit the advantages of both techniques, semiparametric models which combine the parametric and the nonparametric models are analyzed
    • …
    corecore