8,394 research outputs found
Non-Gaussian bivariate modelling with application to atmospheric trace-gas inversion
Atmospheric trace-gas inversion is the procedure by which the sources and
sinks of a trace gas are identified from observations of its mole fraction at
isolated locations in space and time. This is inherently a spatio-temporal
bivariate inversion problem, since the mole-fraction field evolves in space and
time and the flux is also spatio-temporally distributed. Further, the bivariate
model is likely to be non-Gaussian since the flux field is rarely Gaussian.
Here, we use conditioning to construct a non-Gaussian bivariate model, and we
describe some of its properties through auto- and cross-cumulant functions. A
bivariate non-Gaussian, specifically trans-Gaussian, model is then achieved
through the use of Box--Cox transformations, and we facilitate Bayesian
inference by approximating the likelihood in a hierarchical framework.
Trace-gas inversion, especially at high spatial resolution, is frequently
highly sensitive to prior specification. Therefore, unlike conventional
approaches, we assimilate trace-gas inventory information with the
observational data at the parameter layer, thus shifting prior sensitivity from
the inventory itself to its spatial characteristics (e.g., its spatial length
scale). We demonstrate the approach in controlled-experiment studies of methane
inversion, using fluxes extracted from inventories of the UK and Ireland and of
Northern Australia.Comment: 45 pages, 7 figure
Highly efficient Bayesian joint inversion for receiver-based data and its application to lithospheric structure beneath the southern Korean Peninsula
With the deployment of extensive seismic arrays, systematic and efficient parameter and uncertainty estimation is of increasing importance and can provide reliable, regional models for crustal and upper-mantle structure.We present an efficient Bayesian method for the joint inversion of surface-wave dispersion and receiver-function data that combines trans-dimensional (trans-D) model selection in an optimization phase with subsequent rigorous parameter uncertainty estimation. Parameter and uncertainty estimation depend strongly on the chosen parametrization such that meaningful regional comparison requires quantitative model selection that can be carried out efficiently at several sites. While significant progress has been made for model selection (e.g. trans-D inference) at individual sites, the lack of efficiency can prohibit application to large data volumes or cause questionable results due to lack of convergence. Studies that address large numbers of data sets have mostly ignored model selection in favour of more efficient/simple estimation techniques (i.e. focusing on uncertainty estimation but employing ad-hoc model choices). Our approach consists of a two-phase inversion that combines trans-D optimization to select the most probable parametrization with subsequent Bayesian sampling for uncertainty estimation given that parametrization. The trans-D optimization is implemented here by replacing the likelihood function with the Bayesian information criterion (BIC). The BIC provides constraints on model complexity that facilitate the search for an optimal parametrization. Parallel tempering (PT) is applied as an optimization algorithm. After optimization, the optimal model choice is identified by the minimum BIC value from all PT chains. Uncertainty estimation is then carried out in fixed dimension. Data errors are estimated as part of the inference problem by a combination of empirical and hierarchical estimation. Data covariance matrices are estimated from data residuals (the difference between prediction and observation) and periodically updated. In addition, a scaling factor for the covariance matrix magnitude is estimated as part of the inversion. The inversion is applied to both simulated and observed data that consist of phase- and group-velocity dispersion curves (Rayleigh wave), and receiver functions. The simulation results show that model complexity and important features are well estimated by the fixed dimensional posterior probability density. Observed data for stations in different tectonic regions of the southern Korean Peninsula are considered. The results are consistent with published results, but important features are better constrained than in previous regularized inversions and are more consistent across the stations. For example, resolution of crustal and Moho interfaces, and absolute values and gradients of velocities in lower crust and upper mantle are better constrained
Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and Consistency
The Bayesian formulation of inverse problems is attractive for three primary
reasons: it provides a clear modelling framework; means for uncertainty
quantification; and it allows for principled learning of hyperparameters. The
posterior distribution may be explored by sampling methods, but for many
problems it is computationally infeasible to do so. In this situation maximum a
posteriori (MAP) estimators are often sought. Whilst these are relatively cheap
to compute, and have an attractive variational formulation, a key drawback is
their lack of invariance under change of parameterization. This is a
particularly significant issue when hierarchical priors are employed to learn
hyperparameters. In this paper we study the effect of the choice of
parameterization on MAP estimators when a conditionally Gaussian hierarchical
prior distribution is employed. Specifically we consider the centred
parameterization, the natural parameterization in which the unknown state is
solved for directly, and the noncentred parameterization, which works with a
whitened Gaussian as the unknown state variable, and arises when considering
dimension-robust MCMC algorithms; MAP estimation is well-defined in the
nonparametric setting only for the noncentred parameterization. However, we
show that MAP estimates based on the noncentred parameterization are not
consistent as estimators of hyperparameters; conversely, we show that limits of
finite-dimensional centred MAP estimators are consistent as the dimension tends
to infinity. We also consider empirical Bayesian hyperparameter estimation,
show consistency of these estimates, and demonstrate that they are more robust
with respect to noise than centred MAP estimates. An underpinning concept
throughout is that hyperparameters may only be recovered up to measure
equivalence, a well-known phenomenon in the context of the Ornstein-Uhlenbeck
process.Comment: 36 pages, 8 figure
Bayesian Coronal Seismology
In contrast to the situation in a laboratory, the study of the solar
atmosphere has to be pursued without direct access to the physical conditions
of interest. Information is therefore incomplete and uncertain and inference
methods need to be employed to diagnose the physical conditions and processes.
One of such methods, solar atmospheric seismology, makes use of observed and
theoretically predicted properties of waves to infer plasma and magnetic field
properties. A recent development in solar atmospheric seismology consists in
the use of inversion and model comparison methods based on Bayesian analysis.
In this paper, the philosophy and methodology of Bayesian analysis are first
explained. Then, we provide an account of what has been achieved so far from
the application of these techniques to solar atmospheric seismology and a
prospect of possible future extensions.Comment: 19 pages, accepted in Advances in Space Researc
Hierarchical Gaussian process mixtures for regression
As a result of their good performance in practice and their desirable analytical properties, Gaussian process regression models are becoming increasingly of interest in statistics, engineering and other fields. However, two major problems arise when the model is applied to a large data-set with repeated measurements. One stems from the systematic heterogeneity among the different replications, and the other is the requirement to invert a covariance matrix which is involved in the implementation of the model. The dimension of this matrix equals the sample size of the training data-set. In this paper, a Gaussian process mixture model for regression is proposed for dealing with the above two problems, and a hybrid Markov chain Monte Carlo (MCMC) algorithm is used for its implementation. Application to a real data-set is reported
Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/
Bayesian analysis of hierarchical multi-fidelity codes
This paper deals with the Gaussian process based approximation of a code
which can be run at different levels of accuracy. This method, which is a
particular case of co-kriging, allows us to improve a surrogate model of a
complex computer code using fast approximations of it. In particular, we focus
on the case of a large number of code levels on the one hand and on a Bayesian
approach when we have two levels on the other hand. The main results of this
paper are a new approach to estimate the model parameters which provides a
closed form expression for an important parameter of the model (the scale
factor), a reduction of the numerical complexity by simplifying the covariance
matrix inversion, and a new Bayesian modelling that gives an explicit
representation of the joint distribution of the parameters and that is not
computationally expensive. A thermodynamic example is used to illustrate the
comparison between 2-level and 3-level co-kriging
- …