29,231 research outputs found
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
Structured additive regression provides a general framework for complex
Gaussian and non-Gaussian regression models, with predictors comprising
arbitrary combinations of nonlinear functions and surfaces, spatial effects,
varying coefficients, random effects and further regression terms. The large
flexibility of structured additive regression makes function selection a
challenging and important task, aiming at (1) selecting the relevant
covariates, (2) choosing an appropriate and parsimonious representation of the
impact of covariates on the predictor and (3) determining the required
interactions. We propose a spike-and-slab prior structure for function
selection that allows to include or exclude single coefficients as well as
blocks of coefficients representing specific model terms. A novel
multiplicative parameter expansion is required to obtain good mixing and
convergence properties in a Markov chain Monte Carlo simulation approach and is
shown to induce desirable shrinkage properties. In simulation studies and with
(real) benchmark classification data, we investigate sensitivity to
hyperparameter settings and compare performance to competitors. The flexibility
and applicability of our approach are demonstrated in an additive piecewise
exponential model with time-varying effects for right-censored survival times
of intensive care patients with sepsis. Geoadditive and additive mixed logit
model applications are discussed in an extensive appendix
Variable selection for BART: An application to gene regulation
We consider the task of discovering gene regulatory networks, which are
defined as sets of genes and the corresponding transcription factors which
regulate their expression levels. This can be viewed as a variable selection
problem, potentially with high dimensionality. Variable selection is especially
challenging in high-dimensional settings, where it is difficult to detect
subtle individual effects and interactions between predictors. Bayesian
Additive Regression Trees [BART, Ann. Appl. Stat. 4 (2010) 266-298] provides a
novel nonparametric alternative to parametric regression approaches, such as
the lasso or stepwise regression, especially when the number of relevant
predictors is sparse relative to the total number of available predictors and
the fundamental relationships are nonlinear. We develop a principled
permutation-based inferential approach for determining when the effect of a
selected predictor is likely to be real. Going further, we adapt the BART
procedure to incorporate informed prior information about variable importance.
We present simulations demonstrating that our method compares favorably to
existing parametric and nonparametric procedures in a variety of data settings.
To demonstrate the potential of our approach in a biological context, we apply
it to the task of inferring the gene regulatory network in yeast (Saccharomyces
cerevisiae). We find that our BART-based procedure is best able to recover the
subset of covariates with the largest signal compared to other variable
selection methods. The methods developed in this work are readily available in
the R package bartMachine.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS755 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The advantage of being slow: the quasi-neutral contact process
According to the competitive exclusion principle, in a finite ecosystem,
extinction occurs naturally when two or more species compete for the same
resources. An important question that arises is: when coexistence is not
possible, which mechanisms confer an advantage to a given species against the
other(s)? In general, it is expected that the species with the higher
reproductive/death ratio will win the competition, but other mechanisms, such
as asymmetry in interspecific competition or unequal diffusion rates, have been
found to change this scenario dramatically. In this work, we examine
competitive advantage in the context of quasi-neutral population models,
including stochastic models with spatial structure as well as macroscopic
(mean-field) descriptions. We employ a two-species contact process in which the
"biological clock" of one species is a factor of slower than that of
the other species. Our results provide new insights into how stochasticity and
competition interact to determine extinction in finite spatial systems. We find
that a species with a slower biological clock has an advantage if resources are
limited, winning the competition against a species with a faster clock, in
relatively small systems. Periodic or stochastic environmental variations also
favor the slower species, even in much larger systems.Comment: Reviewed extended versio
Speech Enhancement Using An {MMSE} Spectral Amplitude Estimator Based On A Modulation Domain Kalman Filter With A Gamma Prior
In this paper, we propose a minimum mean square error spectral estimator for clean speech spectral amplitudes that uses a Kalman filter to model the temporal dynamics of the spectral amplitudes in the modulation domain. Using a two-parameter Gamma distribution to model the prior distribution of the speech spectral amplitudes, we derive closed form expressions for the posterior mean and variance of the spectral amplitudes as well as for the associated update step of the Kalman filter. The performance of the proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms
- …