22,138 research outputs found

    Improving Inference of Gaussian Mixtures Using Auxiliary Variables

    Full text link
    Expanding a lower-dimensional problem to a higher-dimensional space and then projecting back is often beneficial. This article rigorously investigates this perspective in the context of finite mixture models, namely how to improve inference for mixture models by using auxiliary variables. Despite the large literature in mixture models and several empirical examples, there is no previous work that gives general theoretical justification for including auxiliary variables in mixture models, even for special cases. We provide a theoretical basis for comparing inference for mixture multivariate models with the corresponding inference for marginal univariate mixture models. Analytical results for several special cases are established. We show that the probability of correctly allocating mixture memberships and the information number for the means of the primary outcome in a bivariate model with two Gaussian mixtures are generally larger than those in each univariate model. Simulations under a range of scenarios, including misspecified models, are conducted to examine the improvement. The method is illustrated by two real applications in ecology and causal inference

    Constrained probability distributions of correlation functions

    Full text link
    Context: Two-point correlation functions are used throughout cosmology as a measure for the statistics of random fields. When used in Bayesian parameter estimation, their likelihood function is usually replaced by a Gaussian approximation. However, this has been shown to be insufficient. Aims: For the case of Gaussian random fields, we search for an exact probability distribution of correlation functions, which could improve the accuracy of future data analyses. Methods: We use a fully analytic approach, first expanding the random field in its Fourier modes, and then calculating the characteristic function. Finally, we derive the probability distribution function using integration by residues. We use a numerical implementation of the full analytic formula to discuss the behaviour of this function. Results: We derive the univariate and bivariate probability distribution function of the correlation functions of a Gaussian random field, and outline how higher joint distributions could be calculated. We give the results in the form of mode expansions, but in one special case we also find a closed-form expression. We calculate the moments of the distribution and, in the univariate case, we discuss the Edgeworth expansion approximation. We also comment on the difficulties in a fast and exact numerical implementation of our results, and on possible future applications.Comment: 13 pages, 5 figures, updated to match version published in A&A (slightly expanded Sects. 5.3 and 6

    Pair-copula constructions of multiple dependence

    Get PDF
    Building on the work of Bedford, Cooke and Joe, we show how multivariate data, which exhibit complex patterns of dependence in the tails, can be modelled using a cascade of pair-copulae, acting on two variables at a time. We use the pair-copula decomposition of a general multivariate distribution and propose a method to perform inference. The model construction is hierarchical in nature, the various levels corresponding to the incorporation of more variables in the conditioning sets, using pair-copulae as simple building blocs. Pair-copula decomposed models also represent a very flexible way to construct higher-dimensional coplulae. We apply the methodology to a financial data set. Our approach represents the first step towards developing of an unsupervised algorithm that explores the space of possible pair-copula models, that also can be applied to huge data sets automatically

    Non-Gaussian Geostatistical Modeling using (skew) t Processes

    Get PDF
    We propose a new model for regression and dependence analysis when addressing spatial data with possibly heavy tails and an asymmetric marginal distribution. We first propose a stationary process with tt marginals obtained through scale mixing of a Gaussian process with an inverse square root process with Gamma marginals. We then generalize this construction by considering a skew-Gaussian process, thus obtaining a process with skew-t marginal distributions. For the proposed (skew) tt process we study the second-order and geometrical properties and in the tt case, we provide analytic expressions for the bivariate distribution. In an extensive simulation study, we investigate the use of the weighted pairwise likelihood as a method of estimation for the tt process. Moreover we compare the performance of the optimal linear predictor of the tt process versus the optimal Gaussian predictor. Finally, the effectiveness of our methodology is illustrated by analyzing a georeferenced dataset on maximum temperatures in Australi

    Multivariate type G Mat\'ern stochastic partial differential equation random fields

    Full text link
    For many applications with multivariate data, random field models capturing departures from Gaussianity within realisations are appropriate. For this reason, we formulate a new class of multivariate non-Gaussian models based on systems of stochastic partial differential equations with additive type G noise whose marginal covariance functions are of Mat\'ern type. We consider four increasingly flexible constructions of the noise, where the first two are similar to existing copula-based models. In contrast to these, the latter two constructions can model non-Gaussian spatial data without replicates. Computationally efficient methods for likelihood-based parameter estimation and probabilistic prediction are proposed, and the flexibility of the suggested models is illustrated by numerical examples and two statistical applications

    Detecting spatial patterns with the cumulant function. Part I: The theory

    Get PDF
    In climate studies, detecting spatial patterns that largely deviate from the sample mean still remains a statistical challenge. Although a Principal Component Analysis (PCA), or equivalently a Empirical Orthogonal Functions (EOF) decomposition, is often applied on this purpose, it can only provide meaningful results if the underlying multivariate distribution is Gaussian. Indeed, PCA is based on optimizing second order moments quantities and the covariance matrix can only capture the full dependence structure for multivariate Gaussian vectors. Whenever the application at hand can not satisfy this normality hypothesis (e.g. precipitation data), alternatives and/or improvements to PCA have to be developed and studied. To go beyond this second order statistics constraint that limits the applicability of the PCA, we take advantage of the cumulant function that can produce higher order moments information. This cumulant function, well-known in the statistical literature, allows us to propose a new, simple and fast procedure to identify spatial patterns for non-Gaussian data. Our algorithm consists in maximizing the cumulant function. To illustrate our approach, its implementation for which explicit computations are obtained is performed on three family of of multivariate random vectors. In addition, we show that our algorithm corresponds to selecting the directions along which projected data display the largest spread over the marginal probability density tails.Comment: 9 pages, 3 figure
    corecore