3,545 research outputs found
Bayesian variable selection with shrinking and diffusing priors
We consider a Bayesian approach to variable selection in the presence of high
dimensional covariates based on a hierarchical model that places prior
distributions on the regression coefficients as well as on the model space. We
adopt the well-known spike and slab Gaussian priors with a distinct feature,
that is, the prior variances depend on the sample size through which
appropriate shrinkage can be achieved. We show the strong selection consistency
of the proposed method in the sense that the posterior probability of the true
model converges to one even when the number of covariates grows nearly
exponentially with the sample size. This is arguably the strongest selection
consistency result that has been available in the Bayesian variable selection
literature; yet the proposed method can be carried out through posterior
sampling with a simple Gibbs sampler. Furthermore, we argue that the proposed
method is asymptotically similar to model selection with the penalty. We
also demonstrate through empirical work the fine performance of the proposed
approach relative to some state of the art alternatives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1207 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Penalized maximum likelihood for multivariate Gaussian mixture
In this paper, we first consider the parameter estimation of a multivariate
random process distribution using multivariate Gaussian mixture law. The labels
of the mixture are allowed to have a general probability law which gives the
possibility to modelize a temporal structure of the process under study. We
generalize the case of univariate Gaussian mixture in [Ridolfi99] to show that
the likelihood is unbounded and goes to infinity when one of the covariance
matrices approaches the boundary of singularity of the non negative definite
matrices set. We characterize the parameter set of these singularities. As a
solution to this degeneracy problem, we show that the penalization of the
likelihood by an Inverse Wishart prior on covariance matrices results to a
penalized or maximum a posteriori criterion which is bounded. Then, the
existence of positive definite matrices optimizing this criterion can be
guaranteed. We also show that with a modified EM procedure or with a Bayesian
sampling scheme, we can constrain covariance matrices to belong to a particular
subclass of covariance matrices. Finally, we study degeneracies in the source
separation problem where the characterization of parameter singularity set is
more complex. We show, however, that Inverse Wishart prior on covariance
matrices eliminates the degeneracies in this case too.Comment: Presented at MaxEnt01. To appear in Bayesian Inference and Maximum
Entropy Methods, B. Fry (Ed.), AIP Proceedings. 11pages, 3 Postscript figure
Sparse cointegration
Cointegration analysis is used to estimate the long-run equilibrium relations
between several time series. The coefficients of these long-run equilibrium
relations are the cointegrating vectors. In this paper, we provide a sparse
estimator of the cointegrating vectors. The estimation technique is sparse in
the sense that some elements of the cointegrating vectors will be estimated as
zero. For this purpose, we combine a penalized estimation procedure for vector
autoregressive models with sparse reduced rank regression. The sparse
cointegration procedure achieves a higher estimation accuracy than the
traditional Johansen cointegration approach in settings where the true
cointegrating vectors have a sparse structure, and/or when the sample size is
low compared to the number of time series. We also discuss a criterion to
determine the cointegration rank and we illustrate its good performance in
several simulation settings. In a first empirical application we investigate
whether the expectations hypothesis of the term structure of interest rates,
implying sparse cointegrating vectors, holds in practice. In a second empirical
application we show that forecast performance in high-dimensional systems can
be improved by sparsely estimating the cointegration relations
Spike and slab variable selection: Frequentist and Bayesian strategies
Variable selection in the linear regression model takes many apparent faces
from both frequentist and Bayesian standpoints. In this paper we introduce a
variable selection method referred to as a rescaled spike and slab model. We
study the importance of prior hierarchical specifications and draw connections
to frequentist generalized ridge regression estimation. Specifically, we study
the usefulness of continuous bimodal priors to model hypervariance parameters,
and the effect scaling has on the posterior mean through its relationship to
penalization. Several model selection strategies, some frequentist and some
Bayesian in nature, are developed and studied theoretically. We demonstrate the
importance of selective shrinkage for effective variable selection in terms of
risk misclassification, and show this is achieved using the posterior from a
rescaled spike and slab model. We also show how to verify a procedure's ability
to reduce model uncertainty in finite samples using a specialized forward
selection strategy. Using this tool, we illustrate the effectiveness of
rescaled spike and slab models in reducing model uncertainty.Comment: Published at http://dx.doi.org/10.1214/009053604000001147 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Interpretable Low-Rank Document Representations with Label-Dependent Sparsity Patterns
In context of document classification, where in a corpus of documents their
label tags are readily known, an opportunity lies in utilizing label
information to learn document representation spaces with better discriminative
properties. To this end, in this paper application of a Variational Bayesian
Supervised Nonnegative Matrix Factorization (supervised vbNMF) with
label-driven sparsity structure of coefficients is proposed for learning of
discriminative nonsubtractive latent semantic components occuring in TF-IDF
document representations. Constraints are such that the components pursued are
made to be frequently occuring in a small set of labels only, making it
possible to yield document representations with distinctive label-specific
sparse activation patterns. A simple measure of quality of this kind of
sparsity structure, dubbed inter-label sparsity, is introduced and
experimentally brought into tight connection with classification performance.
Representing a great practical convenience, inter-label sparsity is shown to be
easily controlled in supervised vbNMF by a single parameter
- …