3,463 research outputs found
Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies
Motivated by examples from genetic association studies, this paper considers
the model selection problem in a general complex linear model system and in a
Bayesian framework. We discuss formulating model selection problems and
incorporating context-dependent {\it a priori} information through different
levels of prior specifications. We also derive analytic Bayes factors and their
approximations to facilitate model selection and discuss their theoretical and
computational properties. We demonstrate our Bayesian approach based on an
implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real
data application of mapping tissue-specific eQTLs. Our novel results on Bayes
factors provide a general framework to perform efficient model comparisons in
complex linear model systems
Variable Screening for High Dimensional Time Series
Variable selection is a widely studied problem in high dimensional
statistics, primarily since estimating the precise relationship between the
covariates and the response is of great importance in many scientific
disciplines. However, most of theory and methods developed towards this goal
for the linear model invoke the assumption of iid sub-Gaussian covariates and
errors. This paper analyzes the theoretical properties of Sure Independence
Screening (SIS) (Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008)
849-911]) for high dimensional linear models with dependent and/or heavy tailed
covariates and errors. We also introduce a generalized least squares screening
(GLSS) procedure which utilizes the serial correlation present in the data. By
utilizing this serial correlation when estimating our marginal effects, GLSS is
shown to outperform SIS in many cases. For both procedures we prove sure
screening properties, which depend on the moment conditions, and the strength
of dependence in the error and covariate processes, amongst other factors.
Additionally, combining these screening procedures with the adaptive Lasso is
analyzed. Dependence is quantified by functional dependence measures (Wu [Proc.
Natl. Acad. Sci. USA 102 (2005) 14150-14154]), and the results rely on the use
of Nagaev-type and exponential inequalities for dependent random variables. We
also conduct simulations to demonstrate the finite sample performance of these
procedures, and include a real data application of forecasting the US inflation
rate.Comment: Published in the Electronic Journal of Statistics
(https://projecteuclid.org/euclid.ejs/1519700498
Extensions of stability selection using subsamples of observations and covariates
We introduce extensions of stability selection, a method to stabilise
variable selection methods introduced by Meinshausen and B\"uhlmann (J R Stat
Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly
to random observation subsamples and covariate subsets under scrutiny, and to
select covariates based on their selection frequency. We analyse the effects
and benefits of these extensions. Our analysis generalizes the theoretical
results of Meinshausen and B\"uhlmann (J R Stat Soc 72:417-473, 2010) from the
case of half-samples to subsamples of arbitrary size. We study, in a
theoretical manner, the effect of taking random covariate subsets using a
simplified score model. Finally we validate these extensions on numerical
experiments on both synthetic and real datasets, and compare the obtained
results in detail to the original stability selection method.Comment: accepted for publication in Statistics and Computin
- …