22,138 research outputs found
Improving Inference of Gaussian Mixtures Using Auxiliary Variables
Expanding a lower-dimensional problem to a higher-dimensional space and then
projecting back is often beneficial. This article rigorously investigates this
perspective in the context of finite mixture models, namely how to improve
inference for mixture models by using auxiliary variables. Despite the large
literature in mixture models and several empirical examples, there is no
previous work that gives general theoretical justification for including
auxiliary variables in mixture models, even for special cases. We provide a
theoretical basis for comparing inference for mixture multivariate models with
the corresponding inference for marginal univariate mixture models. Analytical
results for several special cases are established. We show that the probability
of correctly allocating mixture memberships and the information number for the
means of the primary outcome in a bivariate model with two Gaussian mixtures
are generally larger than those in each univariate model. Simulations under a
range of scenarios, including misspecified models, are conducted to examine the
improvement. The method is illustrated by two real applications in ecology and
causal inference
Constrained probability distributions of correlation functions
Context: Two-point correlation functions are used throughout cosmology as a
measure for the statistics of random fields. When used in Bayesian parameter
estimation, their likelihood function is usually replaced by a Gaussian
approximation. However, this has been shown to be insufficient.
Aims: For the case of Gaussian random fields, we search for an exact
probability distribution of correlation functions, which could improve the
accuracy of future data analyses.
Methods: We use a fully analytic approach, first expanding the random field
in its Fourier modes, and then calculating the characteristic function.
Finally, we derive the probability distribution function using integration by
residues. We use a numerical implementation of the full analytic formula to
discuss the behaviour of this function.
Results: We derive the univariate and bivariate probability distribution
function of the correlation functions of a Gaussian random field, and outline
how higher joint distributions could be calculated. We give the results in the
form of mode expansions, but in one special case we also find a closed-form
expression. We calculate the moments of the distribution and, in the univariate
case, we discuss the Edgeworth expansion approximation. We also comment on the
difficulties in a fast and exact numerical implementation of our results, and
on possible future applications.Comment: 13 pages, 5 figures, updated to match version published in A&A
(slightly expanded Sects. 5.3 and 6
Pair-copula constructions of multiple dependence
Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically
Non-Gaussian Geostatistical Modeling using (skew) t Processes
We propose a new model for regression and dependence analysis when addressing
spatial data with possibly heavy tails and an asymmetric marginal distribution.
We first propose a stationary process with marginals obtained through scale
mixing of a Gaussian process with an inverse square root process with Gamma
marginals. We then generalize this construction by considering a skew-Gaussian
process, thus obtaining a process with skew-t marginal distributions. For the
proposed (skew) process we study the second-order and geometrical
properties and in the case, we provide analytic expressions for the
bivariate distribution. In an extensive simulation study, we investigate the
use of the weighted pairwise likelihood as a method of estimation for the
process. Moreover we compare the performance of the optimal linear predictor of
the process versus the optimal Gaussian predictor. Finally, the
effectiveness of our methodology is illustrated by analyzing a georeferenced
dataset on maximum temperatures in Australi
Multivariate type G Mat\'ern stochastic partial differential equation random fields
For many applications with multivariate data, random field models capturing
departures from Gaussianity within realisations are appropriate. For this
reason, we formulate a new class of multivariate non-Gaussian models based on
systems of stochastic partial differential equations with additive type G noise
whose marginal covariance functions are of Mat\'ern type. We consider four
increasingly flexible constructions of the noise, where the first two are
similar to existing copula-based models. In contrast to these, the latter two
constructions can model non-Gaussian spatial data without replicates.
Computationally efficient methods for likelihood-based parameter estimation and
probabilistic prediction are proposed, and the flexibility of the suggested
models is illustrated by numerical examples and two statistical applications
Detecting spatial patterns with the cumulant function. Part I: The theory
In climate studies, detecting spatial patterns that largely deviate from the
sample mean still remains a statistical challenge. Although a Principal
Component Analysis (PCA), or equivalently a Empirical Orthogonal Functions
(EOF) decomposition, is often applied on this purpose, it can only provide
meaningful results if the underlying multivariate distribution is Gaussian.
Indeed, PCA is based on optimizing second order moments quantities and the
covariance matrix can only capture the full dependence structure for
multivariate Gaussian vectors. Whenever the application at hand can not satisfy
this normality hypothesis (e.g. precipitation data), alternatives and/or
improvements to PCA have to be developed and studied. To go beyond this second
order statistics constraint that limits the applicability of the PCA, we take
advantage of the cumulant function that can produce higher order moments
information. This cumulant function, well-known in the statistical literature,
allows us to propose a new, simple and fast procedure to identify spatial
patterns for non-Gaussian data. Our algorithm consists in maximizing the
cumulant function. To illustrate our approach, its implementation for which
explicit computations are obtained is performed on three family of of
multivariate random vectors. In addition, we show that our algorithm
corresponds to selecting the directions along which projected data display the
largest spread over the marginal probability density tails.Comment: 9 pages, 3 figure
- …