177 research outputs found
Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data
Recently, new methods for model assessment, based on subsampling and
posterior approximations, have been proposed for scaling leave-one-out
cross-validation (LOO) to large datasets. Although these methods work well for
estimating predictive performance for individual models, they are less powerful
in model comparison. We propose an efficient method for estimating differences
in predictive performance by combining fast approximate LOO surrogates with
exact LOO subsampling using the difference estimator and supply proofs with
regards to scaling characteristics. The resulting approach can be orders of
magnitude more efficient than previous approaches, as well as being better
suited to model comparison
W-kernel and essential subspace for frequencist's evaluation of Bayesian estimators
The posterior covariance matrix W defined by the log-likelihood of each
observation plays important roles both in the sensitivity analysis and
frequencist's evaluation of the Bayesian estimators. This study focused on the
matrix W and its principal space; we term the latter as an essential subspace.
First, it is shown that they appear in various statistical settings, such as
the evaluation of the posterior sensitivity, assessment of the frequencist's
uncertainty from posterior samples, and stochastic expansion of the loss; a key
tool to treat frequencist's properties is the recently proposed Bayesian
infinitesimal jackknife approximation (Giordano and Broderick (2023)). In the
following part, we show that the matrix W can be interpreted as a reproducing
kernel; it is named as W-kernel. Using the W-kernel, the essential subspace is
expressed as a principal space given by the kernel PCA. A relation to the
Fisher kernel and neural tangent kernel is established, which elucidates the
connection to the classical asymptotic theory; it also leads to a sort of
Bayesian-frequencist's duality. Finally, two applications, selection of a
representative set of observations and dimensional reduction in the approximate
bootstrap, are discussed. In the former, incomplete Cholesky decomposition is
introduced as an efficient method to compute the essential subspace. In the
latter, different implementations of the approximate bootstrap for posterior
means are compared.Comment: 48 pages, 10 figures. Revised and enlarged version of ISM Research
Memorandum No.122
High-dimensional modeling of spatial and spatio-temporal conditional extremes using INLA and the SPDE approach
The conditional extremes framework allows for event-based stochastic modeling
of dependent extremes, and has recently been extended to spatial and
spatio-temporal settings. After standardizing the marginal distributions and
applying an appropriate linear normalization, certain non-stationary Gaussian
processes can be used as asymptotically-motivated models for the process
conditioned on threshold exceedances at a fixed reference location and time. In
this work, we adopt a Bayesian perspective by implementing estimation through
the integrated nested Laplace approximation (INLA), allowing for novel and
flexible semi-parametric specifications of the Gaussian mean function. By using
Gauss-Markov approximations of the Mat\'ern covariance function (known as the
Stochastic Partial Differential Equation approach) at a latent stage of the
model, likelihood-based inference becomes feasible even with thousands of
observed locations. We explain how constraints on the spatial and
spatio-temporal Gaussian processes, arising from the conditioning mechanism,
can be implemented through the latent variable approach without losing the
computationally convenient Markov property. We discuss tools for the comparison
of models via their posterior distributions, and illustrate the flexibility of
the approach with gridded Red Sea surface temperature data at over 6,000
observed locations. Posterior sampling is exploited to study the probability
distribution of cluster functionals of spatial and spatio-temporal extreme
episodes
MCMC methods for inference in a mathematical model of pulmonary circulation
This study performs parameter inference in a partial differential equations system of pulmonary circulation. We use a fluid dynamics network model that takes selected parameter values and mimics the behaviour of the pulmonary haemodynamics under normal physiological and pathological conditions. This is of medical interest as it enables tracking the progression of pulmonary hypertension. We show how we make the fluids model tractable by reducing the parameter dimension from a 55D to a 5D problem. The Delayed Rejection Adaptive Metropolis algorithm, coupled with constraint nonâlinear optimization, is successfully used to learn the parameter values and quantify the uncertainty in the parameter estimates. To accommodate for different magnitudes of the parameter values, we introduce an improved parameter scaling technique in the Delayed Rejection Adaptive Metropolis algorithm. Formal convergence diagnostics are employed to check for convergence of the Markov chains. Additionally, we perform model selection using different information criteria, including Watanabe Akaike Information Criteria
A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance
Compositional Data Analysis (CoDa) has gained popularity in recent years.
This type of data consists of values from disjoint categories that sum up to a
constant. Both Dirichlet regression and logistic-normal regression have become
popular as CoDa analysis methods. However, fitting this kind of multivariate
models presents challenges, especially when structured random effects are
included in the model, such as temporal or spatial effects.
To overcome these challenges, we propose the logistic-normal Dirichlet Model
(LNDM). We seamlessly incorporate this approach into the R-INLA package,
facilitating model fitting and model prediction within the framework of Latent
Gaussian Models (LGMs). Moreover, we explore metrics like Deviance Information
Criteria (DIC), Watanabe Akaike information criterion (WAIC), and
cross-validation measure conditional predictive ordinate (CPO) for model
selection in R-INLA for CoDa.
Illustrating LNDM through a simple simulated example and with an ecological
case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its
potential as an effective tool for managing CoDa and large CoDa databases
- âŠ