3,582 research outputs found

    Accelerated Structure-Aware Sparse Bayesian Learning for 3D Electrical Impedance Tomography

    Get PDF

    Self-Validated Ensemble Modelling

    Get PDF
    An important objective when performing designed experiments is to build models that predict future performance of a system in study; e.g. predict future yields of a bio-process used to manufacture therapeutic proteins. Because experimentation is costly experimental designs are structured to be efficient in terms of the number of trials while providing substantial information about the behavior of the physical system. The strategy to build accurate predictive models in larger data sets is to partition the data into a training set, used to fit the model, and a validation set to access prediction performance. Models are selected that have the lowest prediction error on the validation set. However, designed experiments are usually small in sample size and have a fixed structure which precludes partitioning of any kind; the entire set must be used for training. Contemporary methods use information criteria like the AICc or BIC with model algorithms such as Forward Selection or Lasso to select candidate models. These surrogate prediction measures often produce models with poor prediction performance relative to models selected using a validation procedure such ascross validation. This approach also uses a single fit from a model algorithm which we show to be insufficient. We propose a novel approach that allows the original data set to function as both a training set and a validation set. We accomplish this auto-validation strategy by employing a unique fractionally re-weighted bootstrapping technique. The weighting scheme is structured to induce anti-correlation between the original set and the auto-validation copy. We randomly assign new fractional weights using the bootstrap algorithm and fit a predictive model. This procedure is iterated many times producing a new model each time. The final model is the average of these models. We refer to this new methodology as Self-Validated Ensemble Modeling (SVEM). In this dissertation we investigate the performance of the SVEM algorithm across various scenarios: different model selection algorithms, different designs with varying sample sizes, model noise levels, and sparsity. This investigation shows that SVEM outperforms contemporary one-shot model selection approaches

    Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection

    Full text link
    High-dimensional feature selection arises in many areas of modern science. For example, in genomic research we want to find the genes that can be used to separate tissues of different classes (e.g. cancer and normal) from tens of thousands of genes that are active (expressed) in certain tissue cells. To this end, we wish to fit regression and classification models with a large number of features (also called variables, predictors). In the past decade, penalized likelihood methods for fitting regression models based on hyper-LASSO penalization have received increasing attention in the literature. However, fully Bayesian methods that use Markov chain Monte Carlo (MCMC) are still in lack of development in the literature. In this paper we introduce an MCMC (fully Bayesian) method for learning severely multi-modal posteriors of logistic regression models based on hyper-LASSO priors (non-convex penalties). Our MCMC algorithm uses Hamiltonian Monte Carlo in a restricted Gibbs sampling framework; we call our method Bayesian logistic regression with hyper-LASSO (BLRHL) priors. We have used simulation studies and real data analysis to demonstrate the superior performance of hyper-LASSO priors, and to investigate the issues of choosing heaviness and scale of hyper-LASSO priors.Comment: 33 pages. arXiv admin note: substantial text overlap with arXiv:1308.469
    • …
    corecore