2,315 research outputs found
Penalized Likelihood and Bayesian Function Selection in Regression Models
Challenging research in various fields has driven a wide range of
methodological advances in variable selection for regression models with
high-dimensional predictors. In comparison, selection of nonlinear functions in
models with additive predictors has been considered only more recently. Several
competing suggestions have been developed at about the same time and often do
not refer to each other. This article provides a state-of-the-art review on
function selection, focusing on penalized likelihood and Bayesian concepts,
relating various approaches to each other in a unified framework. In an
empirical comparison, also including boosting, we evaluate several methods
through applications to simulated and real data, thereby providing some
guidance on their performance in practice
A Bayesian information criterion for singular models
We consider approximate Bayesian model choice for model selection problems
that involve models whose Fisher-information matrices may fail to be invertible
along other competing submodels. Such singular models do not obey the
regularity conditions underlying the derivation of Schwarz's Bayesian
information criterion (BIC) and the penalty structure in BIC generally does not
reflect the frequentist large-sample behavior of their marginal likelihood.
While large-sample theory for the marginal likelihood of singular models has
been developed recently, the resulting approximations depend on the true
parameter value and lead to a paradox of circular reasoning. Guided by examples
such as determining the number of components of mixture models, the number of
factors in latent factor models or the rank in reduced-rank regression, we
propose a resolution to this paradox and give a practical extension of BIC for
singular model selection problems
Uncertainty Quantification in Bayesian Reduced-Rank Sparse Regressions
Reduced-rank regression recognises the possibility of a rank-deficient matrix
of coefficients, which is particularly useful when the data is
high-dimensional. We propose a novel Bayesian model for estimating the rank of
the rank of the coefficient matrix, which obviates the need of post-processing
steps, and allows for uncertainty quantification. Our method employs a mixture
prior on the regression coefficient matrix along with a global-local shrinkage
prior on its low-rank decomposition. Then, we rely on the Signal Adaptive
Variable Selector to perform sparsification, and define two novel tools, the
Posterior Inclusion Probability uncertainty index and the Relevance Index. The
validity of the method is assessed in a simulation study, then its advantages
and usefulness are shown in real-data applications on the chemical composition
of tobacco and on the photometry of galaxies
Very High Dimensional Semiparametric Models
Very high dimensional semiparametric models play a major role in many areas, in particular in signal detection problems when sparse signals or sparse events are hidden among high dimensional noise. Concrete examples are genomic studies in biostatistics or imaging problems. In a broad context all kind of statistical inference and model selection problems were discussed for high dimensional data
Bayesian Markov-Switching Tensor Regression for Time-Varying Networks
Modeling time series of multilayer network data is challenging due to the peculiar characteristics of real-world networks, such as sparsity and abrupt structural changes. Moreover, the impact of external factors on the network edges is highly heterogeneous due to edge- and time-specific effects. Capturing all these features results in a very high-dimensional inference problem. A novel tensor-on-tensor regression model is proposed, which integrates zero-inflated logistic regression to deal with the sparsity, and Markov-switching coefficients to account for structural changes. A tensor representation and decomposition of the regression coefficients are used to tackle the high-dimensionality and account for the heterogeneous impact of the covariate tensor across the response variables. The inference is performed following a Bayesian approach, and an efficient Gibbs sampler is developed for posterior approximation. Our methodology applied to financial and email networks detects different connectivity regimes and uncovers the role of covariates in the edge-formation process, which are relevant in risk and resource management. Code is available on GitHub. Supplementary materials for this article are available online
- …