411 research outputs found

    Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net

    Get PDF
    The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over "regression frequencies". This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible.Comment: Switched to new NeurIPS style file; 11 pages, 3 figures + appendices 29 pages, 3 supplementary figure

    MIBEN: Robust Multiple Imputation with the Bayesian Elastic Net

    Get PDF
    Correctly specifying the imputation model when conducting multiple imputation remains one of the most significant challenges in missing data analysis. This dissertation introduces a robust multiple imputation technique, Multiple Imputation with the Bayesian Elastic Net (MIBEN), as a remedy for this difficulty. A Monte Carlo simulation study was conducted to assess the performance of the MIBEN technique and compare it to several state-of-the-art multiple imputation methods

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    L1 methods for shrinkage and correlation

    Get PDF
    This dissertation explored the idea of L1 norm in solving two statistical problems including multiple linear regression and diagnostic checking in time series. In recent years L1 shrinkage methods have become popular in linear regression as they can achieve simultaneous variable selection and parameter estimation. Their objective functions containing a least squares term and an L1 penalty term which can produce sparse solutions (Fan and Li, 2001). Least absolute shrinkage and selection operator (Lasso) was the first L1 penalized method proposed and has been widely used in practice. But the Lasso estimator has noticeable bias and is inconsistent for variable selection. Zou (2006) proposed adaptive Lasso and proved its oracle properties under some regularity conditions. We investigate the performance of adaptive Lasso by applying it to the problem of multiple undocumented change-point detection in climate. Artificial factors such as relocation of weather stations, recalibration of measurement instruments and city growth can cause abrupt mean shifts in historical temperature data. These changes do not reflect the true atmospheric evolution and unfortunately are often undocumented due to various reasons. It is imperative to locate the occurrence of these abrupt mean shifts so that raw data can be adjusted to only display the true atmosphere evolution. We have built a special linear model which accounts for long-term temperature change (global warming) by linear trend and is featured by p = n (the number of variables equals the number of observations). We apply adaptive Lasso to estimate the underlying sparse model and allow the trend parameter to be unpenalized in the objective function. Bayesian Information Criterion (BIC) and the CM criterion (Caussinus and Mestre, 2004) are used to select the finalized model. Multivariate t simultaneous confidence intervals can post-select the change-points detected by adaptive Lasso to attenuate overestimation. Considering that the oracle properties of adaptive Lasso are obtained under the condition of linear independence between predictor variables, adaptive Lasso should be used with caution since it is not uncommon for real data sets to have multicollinearity. Zou and Hastie (2005) proposed elastic net whose objective function involves both L1 and L2 penalties and claimed its superiority over Lasso in prediction. This procedure can identify a sparse model due to the L1 penalty and can tackle multicollinearity due to the L2 penalty. Although Lasso and elastic net are favored over ordinary least squares and ridge regression because of their functionality of variable selection, in presence of multicollinearity ridge regression can outperform both Lasso and elastic net in prediction. The salient point is that no regression method dominates in all cases (Fan and Li, 2001, Zou, 2006, Zou and Hastie, 2005). One major flaw of both Lasso and elastic net is the unnecessary bias brought by constraining all parameters to be penalized by the same norm. In this dissertation we propose a general and flexible framework for variable selection and estimation in linear regression. Our objective function automatically allows each parameter to be unpenalized, penalized by L1, L2 or both norms based on parameter significance and variable correlation. The resulting estimator not only can identify the correct set of significant variables with a large probability but also has smaller bias for nonzero parameters. Our procedure is a combinatorial optimization problem which can be solved by exhaustive search or genetic algorithm (as a surrogate to computation time). Aimed at a descriptive model, BIC is chosen as the model selection criterion. Another application of the L1 norm considered in this dissertation is portmanteau tests in time series. The first step in time series regression is to determine if significant serial correlation is present. If initial investigations indicate significant serial correlation, the second step is to fit an autoregressive moving average (ARMA) process to parameterize the correlation function. Portmanteau tests are commonly used to detect serial correlation or assess the goodness-of-fit of the ARMA model in these two steps. For small samples the commonly employed Ljung-Box portmanteau test (Ljung and Box, 1978) can have low power. It is beneficial to have a more powerful small sample test for detecting significant correlation. We develop such a test by considering the Cauchy estimator of correlation. While the usual sample correlation is estimated through L2 norm, the Cauchy estimator is based on L1 norm. Asymptotic properties of the test statistic are obtained. The test compares very favorably with the Box-Pierce/Ljung-Box statistics in detecting autoregressive alternatives

    A novel prestack sparse azimuthal AVO inversion

    Full text link
    In this paper we demonstrate a new algorithm for sparse prestack azimuthal AVO inversion. A novel Euclidean prior model is developed to at once respect sparseness in the layered earth and smoothness in the model of reflectivity. Recognizing that methods of artificial intelligence and Bayesian computation are finding an every increasing role in augmenting the process of interpretation and analysis of geophysical data, we derive a generalized matrix-variate model of reflectivity in terms of orthogonal basis functions, subject to sparse constraints. This supports a direct application of machine learning methods, in a way that can be mapped back onto the physical principles known to govern reflection seismology. As a demonstration we present an application of these methods to the Marcellus shale. Attributes extracted using the azimuthal inversion are clustered using an unsupervised learning algorithm. Interpretation of the clusters is performed in the context of the Ruger model of azimuthal AVO

    Getting the most from medical VOC data using Bayesian feature learning

    Get PDF
    The metabolic processes in the body naturally produce a diverse set of Volatile Organic Compounds (VOCs), which are excreted in breath, urine, stool and other biological samples. The VOCs produced are odorous and influenced by disease, meaning olfaction can provide information on a person’s disease state. A variety of instruments exist for performing “artificial olfaction”: measuring a sample, such as patient breath, and producing a high dimensional output representing the odour. Such instruments may be paired with machine learning techniques to identify properties of interest, such as the presence of a given disease. Research shows good disease-predictive ability of artificial olfaction instrumentation. However, the statistical methods employed are typically off-the-shelf, and do not take advantage of prior knowledge of the structure of the high dimensional data. Since sample sizes are also typically small, this can lead to suboptimal results due to a poorly-learned model. In this thesis we explore ways to get more out of artificial olfaction data. We perform statistical analyses in a medical setting, investigating disease diagnosis from breath, urine and vaginal swab measurements, and illustrating both successful identification and failure cases. We then introduce two new latent variable models constructed for dimension reduction of artificial olfaction data, but which are widely applicable. These models place a Gaussian Process (GP) prior on the mapping from latent variables to observations. Specifying a covariance function for the GP prior is an intuitive way for a user to describe their prior knowledge of the data covariance structure. We also enable an approximate posterior and marginal likelihood to be computed, and introduce a sparse variant. Both models have been made available in the R package stpca hosted at https://github.com/JimSkinner/stpca. In experiments with artificial olfaction data, these models outperform standard feature learning methods in a predictive pipeline

    Methods of Uncertainty Quantification for Physical Parameters

    Get PDF
    Uncertainty Quantification (UQ) is an umbrella term referring to a broad class of methods which typically involve the combination of computational modeling, experimental data and expert knowledge to study a physical system. A parameter, in the usual statistical sense, is said to be physical if it has a meaningful interpretation with respect to the physical system. Physical parameters can be viewed as inherent properties of a physical process and have a corresponding true value. Statistical inference for physical parameters is a challenging problem in UQ due to the inadequacy of the computer model. In this thesis, we provide a comprehensive overview of the existing relevant UQ methodology. The computer model is often time consuming, proprietary or classified and therefore a cheap-to-evaluate emulator is needed. When the input space is large, Gaussian process (GP) emulation may be infeasible and the predominant local GP framework is too slow for prediction when MCMC is used for posterior sampling. We propose two modifications to this LA-GP framework which can be used to construct a cheap-to-evaluate emulator for the computer model, offering the user a simple and flexible time for memory exchange. When the field data consist of measurements across a set of experiments, it is common for a set of computer model inputs to represent measurements of a physical component, recorded with error. When this structure is present, we propose a new metric for identifying overfitting and a related regularization prior distribution. We show that these parameters lead to improved inference for compressibility parameters of tantalum. We propose an approximate Bayesian framework, referred to as modularization, which is shown to be useful for exploring dependencies between physical and nuisance parameters, with respect to the inadequacy of the computer model and the available prior information. We discuss a cross validation framework, modified to account for spatial (or temporal) structure, and show that it can aid in the construction of empirical Bayes priors for the model discrepancy. This CV framework can be coupled with modularization to assess the sensitivity of physical parameters to the discrepancy related modeling choices
    • 

    corecore