2,315 research outputs found

    Penalized Likelihood and Bayesian Function Selection in Regression Models

    Full text link
    Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice

    A Bayesian information criterion for singular models

    Full text link
    We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher-information matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz's Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist large-sample behavior of their marginal likelihood. While large-sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced-rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems

    Uncertainty Quantification in Bayesian Reduced-Rank Sparse Regressions

    Full text link
    Reduced-rank regression recognises the possibility of a rank-deficient matrix of coefficients, which is particularly useful when the data is high-dimensional. We propose a novel Bayesian model for estimating the rank of the rank of the coefficient matrix, which obviates the need of post-processing steps, and allows for uncertainty quantification. Our method employs a mixture prior on the regression coefficient matrix along with a global-local shrinkage prior on its low-rank decomposition. Then, we rely on the Signal Adaptive Variable Selector to perform sparsification, and define two novel tools, the Posterior Inclusion Probability uncertainty index and the Relevance Index. The validity of the method is assessed in a simulation study, then its advantages and usefulness are shown in real-data applications on the chemical composition of tobacco and on the photometry of galaxies

    Very High Dimensional Semiparametric Models

    Get PDF
    Very high dimensional semiparametric models play a major role in many areas, in particular in signal detection problems when sparse signals or sparse events are hidden among high dimensional noise. Concrete examples are genomic studies in biostatistics or imaging problems. In a broad context all kind of statistical inference and model selection problems were discussed for high dimensional data

    Bayesian Markov-Switching Tensor Regression for Time-Varying Networks

    Get PDF
    Modeling time series of multilayer network data is challenging due to the peculiar characteristics of real-world networks, such as sparsity and abrupt structural changes. Moreover, the impact of external factors on the network edges is highly heterogeneous due to edge- and time-specific effects. Capturing all these features results in a very high-dimensional inference problem. A novel tensor-on-tensor regression model is proposed, which integrates zero-inflated logistic regression to deal with the sparsity, and Markov-switching coefficients to account for structural changes. A tensor representation and decomposition of the regression coefficients are used to tackle the high-dimensionality and account for the heterogeneous impact of the covariate tensor across the response variables. The inference is performed following a Bayesian approach, and an efficient Gibbs sampler is developed for posterior approximation. Our methodology applied to financial and email networks detects different connectivity regimes and uncovers the role of covariates in the edge-formation process, which are relevant in risk and resource management. Code is available on GitHub. Supplementary materials for this article are available online
    corecore