31 research outputs found

    Bayesian variable selection and the Swendsen-Wang algorithm

    Get PDF
    The need to explore model uncertainty in linear regression models with many predictors has motivated improvements in Markov chain Monte Carlo sampling algorithms for Bayesian variable selection. Traditional sampling algorithms for Bayesian variable selection may perform poorly when there are severe multicollinearities amongst the predictors. In this paper we describe a new sampling method based on an analogy with the Swendsen-Wang algorithm for the Ising model, and which can give substantial improvements over traditional sampling schemes in the presence of multicollinearity. In linear regression with a given set of potential predictors we can index different possible models by a binary parameter vector which indicates which of the predictors are included or excluded. By thinking of the posterior distribution of this parameter as a binary spatial field, we can approximate the posterior distribution by an Ising model and then apply a modified Swendsen-Wang algorithm for sampling from the posterior where dependence among parameters is reduced by conditioning on auxiliary variables. Performance of the method is described for both simulated and real data

    Fast and Accurate Algorithm for Eye Localization for Gaze Tracking in Low Resolution Images

    Full text link
    Iris centre localization in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks, etc. This paper proposes an efficient method for determining iris centre in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for iris centre localization. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution based approach is used for obtaining the coarse location of iris centre (IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases like BioID, Gi4E and is found to outperform the state of the art methods.Comment: 12 pages, 10 figures, IET Computer Vision, 201

    Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

    Full text link
    Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner's g-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way

    A Bayesian Heteroscedastic GLM with Application to fMRI Data with Motion Spikes

    Full text link
    We propose a voxel-wise general linear model with autoregressive noise and heteroscedastic noise innovations (GLMH) for analyzing functional magnetic resonance imaging (fMRI) data. The model is analyzed from a Bayesian perspective and has the benefit of automatically down-weighting time points close to motion spikes in a data-driven manner. We develop a highly efficient Markov Chain Monte Carlo (MCMC) algorithm that allows for Bayesian variable selection among the regressors to model both the mean (i.e., the design matrix) and variance. This makes it possible to include a broad range of explanatory variables in both the mean and variance (e.g., time trends, activation stimuli, head motion parameters and their temporal derivatives), and to compute the posterior probability of inclusion from the MCMC output. Variable selection is also applied to the lags in the autoregressive noise process, making it possible to infer the lag order from the data simultaneously with all other model parameters. We use both simulated data and real fMRI data from OpenfMRI to illustrate the importance of proper modeling of heteroscedasticity in fMRI data analysis. Our results show that the GLMH tends to detect more brain activity, compared to its homoscedastic counterpart, by allowing the variance to change over time depending on the degree of head motion

    Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities

    Full text link
    Bayesian variable selection has gained much empirical success recently in a variety of applications when the number KK of explanatory variables (x1,...,xK)(x_1,...,x_K) is possibly much larger than the sample size nn. For generalized linear models, if most of the xjx_j's have very small effects on the response yy, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality KnK\gg n. In this approach a suitable prior can be used to choose a few out of the many xjx_j's to model yy, so that the posterior will propose probability densities pp that are ``often close'' to the true density pp^* in some sense. The closeness can be described by a Hellinger distance between pp and pp^* that scales at a power very close to n1/2n^{-1/2}, which is the ``finite-dimensional rate'' corresponding to a low-dimensional situation. These findings extend some recent work of Jiang [Technical Report 05-02 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.Comment: Published in at http://dx.doi.org/10.1214/009053607000000019 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Integrating biological knowledge into variable selection : an empirical Bayes approach with an application in cancer biology

    Get PDF
    Background: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge

    Computational Efficiency in Bayesian Model and Variable Selection

    Get PDF
    Large scale Bayesian model averaging and variable selection exercises present, despite the great increase in desktop computing power, considerable computational challenges. Due to the large scale it is impossible to evaluate all possible models and estimates of posterior probabilities are instead obtained from stochastic (MCMC) schemes designed to converge on the posterior distribution over the model space. While this frees us from the requirement of evaluating all possible models the computational effort is still substantial and efficient implementation is vital. Efficient implementation is concerned with two issues: the efficiency of the MCMC algorithm itself and efficient computation of the quantities needed to obtain a draw from the MCMC algorithm. We evaluate several different MCMC algorithms and find that relatively simple algorithms with local moves perform competitively except possibly when the data is highly collinear. For the second aspect, efficient computation within the sampler, we focus on the important case of linear models where the computations essentially reduce to least squares calculations. Least squares solvers that update a previous model estimate are appealing when the MCMC algorithm makes local moves and we find that the Cholesky update is both fast and accurate.Bayesian Model Averaging; Sweep operator; Cholesky decomposition; QR decomposition; Swendsen-Wang algorithm

    Computational Efficiency in Bayesian Model and Variable Selection

    Get PDF
    This paper is concerned with the efficient implementation of Bayesian model averaging (BMA) and Bayesian variable selection, when the number of candidate variables and models is large, and estimation of posterior model probabilities must be based on a subset of the models. Efficient implementation is concerned with two issues, the efficiency of the MCMC algorithm itself and efficient computation of the quantities needed to obtain a draw from the MCMC algorithm. For the first aspect, it is desirable that the chain moves well and quickly through the model space and takes draws from regions with high probabilities. In this context there is a natural trade-off between local moves, which make use of the current parameter values to propose plausible values for model parameters, and more global transitions, which potentially allow exploration of the distribution of interest in fewer steps, but where each step is more computationally intensive. We assess the convergence properties of simple samplers based on local moves and some recently proposed algorithms intended to improve on the basic samplers. For the second aspect, efficient computation within the sampler, we focus on the important case of linear models where the computations essentially reduce to least squares calculations. When the chain makes local moves, adding or dropping a variable, substantial gains in efficiency can be made by updating the previous least squares solution.
    corecore