Search CORE

83,636 research outputs found

Variable selection for BART: An application to gene regulation

Author: Bleich Justin
George Edward I.
Jensen Shane T.
Kapelner Adam
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

We consider the task of discovering gene regulatory networks, which are defined as sets of genes and the corresponding transcription factors which regulate their expression levels. This can be viewed as a variable selection problem, potentially with high dimensionality. Variable selection is especially challenging in high-dimensional settings, where it is difficult to detect subtle individual effects and interactions between predictors. Bayesian Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-298] provides a novel nonparametric alternative to parametric regression approaches, such as the lasso or stepwise regression, especially when the number of relevant predictors is sparse relative to the total number of available predictors and the fundamental relationships are nonlinear. We develop a principled permutation-based inferential approach for determining when the effect of a selected predictor is likely to be real. Going further, we adapt the BART procedure to incorporate informed prior information about variable importance. We present simulations demonstrating that our method compares favorably to existing parametric and nonparametric procedures in a variety of data settings. To demonstrate the potential of our approach in a biological context, we apply it to the task of inferring the gene regulatory network in yeast (Saccharomyces cerevisiae). We find that our BART-based procedure is best able to recover the subset of covariates with the largest signal compared to other variable selection methods. The methods developed in this work are readily available in the R package bartMachine.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS755 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Author: Bang
Berger
Berger
Bethlehem
Breidt
Brewer
Brookhart
Buchanan
Cao
Chen
Chen
Chen
Chernozhukov
Chipperfield
Conti
De
Deville
DiSogra
Elliott
Fan
Fan
Farrell
Friedman
Fuller
Gao
Grafström
Han
Hunter
Hájek
Johnson
Kang
Keiding
Kim
Kim
Kott
Kott
Lee
McConville
Meng
O’Muircheartaigh
Patrick
Rivers
Rosenbaum
Shao
Shortreed
Stuart
Stuart
Tillé
Tsiatis
Valliant
Publication venue
Publication date: 23/08/2019
Field of study

Non-probability samples become increasingly popular in survey statistics but may suffer from selection biases that limit the generalizability of results to the target population. We consider integrating a non-probability sample with a probability sample which provides high-dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded-concave penalties to select important variables for the sampling score of selection into the non-probability sample and the outcome model. We show that the penalized estimating equation approach enjoys the selection consistency property for general probability samples. The major technical hurdle is due to the possible dependence of the sample under the finite population framework. To overcome this challenge, we construct martingales which enable us to apply Bernstein concentration inequality for martingales. In the second step, we focus on a doubly robust estimator of the finite population mean and re-estimate the nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator. This estimating strategy mitigates the possible first-step selection error and renders the doubly robust estimator root-n consistent if either the sampling probability or the outcome model is correctly specified

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

NSL-BLRL: Efficient Cache Warmup for Sampled Processor Simulation

Author: De Bosschere Koen
Eeckhout Lieven
Hellebaut Filip
Van Ertvelde Luk
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership

Author: Zens Gregor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/02/2019
Field of study

A method for implicit variable selection in mixture-of-experts frameworks is proposed. We introduce a prior structure where information is taken from a set of independent covariates. Robust class membership predictors are identified using a normal gamma prior. The resulting model setup is used in a finite mixture of Bernoulli distributions to find homogenous clusters of women in Mozambique based on their information sources on HIV. Fully Bayesian inference is carried out via the implementation of a Gibbs sampler

Elektronische Publikationen der Wirtschaftsuniversität Wien

Bayesian Inference under Cluster Sampling with Probability Proportional to Size

Author: Little RJA
Meeden G
Reiter JP
Wolter KM
Publication venue: 'Wiley'
Publication date: 02/10/2017
Field of study

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design-based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two-stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains. We use Stan for computing and apply the proposal to the Fragile Families and Child Wellbeing study as an illustration of complex survey inference in health surveys

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan