116 research outputs found

    A hierarchical Bayesian approach to record linkage and population size problems

    Full text link
    We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed on different occasions. We show how this model can be profitably adopted both in record linkage problems and in capture--recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences between the proposed model-based approach and the current practice in record linkage. First, the statistical model is built up on the actually observed categorical variables and no reduction (to 0--1 comparisons) of the available information takes place. Second, the hierarchical structure of the model allows a two-way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in performing the record linkage. We illustrate and motivate our proposal through a real data example and simulations.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS447 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Approximate Bayesian inference in semiparametric copula models

    Full text link
    We describe a simple method for making inference on a functional of a multivariate distribution. The method is based on a copula representation of the multivariate distribution and it is based on the properties of an Approximate Bayesian Monte Carlo algorithm, where the proposed values of the functional of interest are weighed in terms of their empirical likelihood. This method is particularly useful when the "true" likelihood function associated with the working model is too costly to evaluate or when the working model is only partially specified.Comment: 27 pages, 18 figure

    Approximate Integrated Likelihood via ABC methods

    Full text link
    We propose a novel use of a recent new computational tool for Bayesian inference, namely the Approximate Bayesian Computation (ABC) methodology. ABC is a way to handle models for which the likelihood function may be intractable or even unavailable and/or too costly to evaluate; in particular, we consider the problem of eliminating the nuisance parameters from a complex statistical model in order to produce a likelihood function depending on the quantity of interest only. Given a proper prior for the entire vector parameter, we propose to approximate the integrated likelihood by the ratio of kernel estimators of the marginal posterior and prior for the quantity of interest. We present several examples.Comment: 28 pages, 8 figure

    Objective Bayesian analysis for the multivariate skew-t model

    Full text link
    We perform a Bayesian analysis of the p-variate skew-t model, providing a new parameterization, a set of non-informative priors and a sampler specifically designed to explore the posterior density of the model parameters. Extensions, such as the multivariate regression model with skewed errors and the stochastic frontiers model, are easily accommodated. A novelty introduced in the paper is given by the extension of the bivariate skew-normal model given in Liseo & Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the R package mvst, which allows to estimate the multivariate skew-t model

    Bayesian inference for the multivariate skew-normal model: a Population Monte Carlo approach

    Full text link
    Frequentist and likelihood methods of inference based on the multivariate skew-normal model encounter several technical difficulties with this model. In spite of the popularity of this class of densities, there are no broadly satisfactory solutions for estimation and testing problems. A general population Monte Carlo algorithm is proposed which: 1) exploits the latent structure stochastic representation of skew-normal random variables to provide a full Bayesian analysis of the model and 2) accounts for the presence of constraints in the parameter space. The proposed approach can be defined as weakly informative, since the prior distribution approximates the actual reference prior for the shape parameter vector. Results are compared with the existing classical solutions and the practical implementation of the algorithm is illustrated via a simulation study and a real data example. A generalization to the matrix variate regression model with skew-normal error is also presented

    Comment on Article by Berger, Bernardo, and Sun

    Full text link
    Discussion of Overall Objective Priors by James O. Berger, Jose M. Bernardo, Dongchu Sun [arXiv:1504.02689].Comment: Published at http://dx.doi.org/10.1214/14-BA938 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

    Computational aspects of Bayesian spectral density estimation

    Full text link
    Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the non-sparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (that is, the prior times the approximate likelihood), and then to recover the exact posterior through importance sampling. We show that the variance of the importance sampling weights vanishes as the sample size goes to infinity. We explain why the approximate posterior may typically multi-modal, and we derive a Sequential Monte Carlo sampler based on an annealing sequence in order to sample from that target distribution. Performance of the overall approach is evaluated on simulated and real datasets. In addition, for one real world dataset, we provide some numerical evidence that a Bayesian approach to semi-parametric estimation of spectral density may provide more reasonable results than its Frequentist counter-parts

    Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process

    Full text link
    A stationary Gaussian process is said to be long-range dependent (resp., anti-persistent) if its spectral density f(λ)f(\lambda) can be written as f(λ)=∣λ∣−2dg(∣λ∣)f(\lambda)=|\lambda|^{-2d}g(|\lambda|), where 0<d<1/20<d<1/2 (resp., −1/2<d<0-1/2<d<0), and gg is continuous and positive. We propose a novel Bayesian nonparametric approach for the estimation of the spectral density of such processes. We prove posterior consistency for both dd and gg, under appropriate conditions on the prior distribution. We establish the rate of convergence for a general class of priors and apply our results to the family of fractionally exponential priors. Our approach is based on the true likelihood and does not resort to Whittle's approximation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS955 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A unified framework for de-duplication and population size estimation (with Discussion)

    Get PDF
    Data de-duplication is the process of finding records in one or more datasets belonging to the same entity. In this paper we tackle the de-duplication process via a latent entity model, where the observed data are perturbed versions of a set of key variables drawn from a finite population of NN different entities. The main novelty of our approach is to consider the population size NN as an unknown model parameter. As a result, one salient feature of the proposed method is the capability of the model to account for the de-duplication uncertainty in the population size estimation. As by-products of our approach we illustrate the relationships between de-duplication problems and capture-recapture models and we obtain a more adequate prior distribution on the linkage structure. Moreover we propose a novel simulation algorithm for the posterior distribution of the matching configuration based on the marginalization of the key variables at the population level. We apply our approach to two synthetic data sets comprising German names. In addition we illustrate a real data application matching records from two lists reporting victims killed in the recent Syrian conflict

    An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression

    Full text link
    We consider the general problem of Bayesian binary regression with a large number of covariates. We introduce a new class of distributions, the Perturbed Unified Skew Normal (PSUN), which generalizes the SUN class and we show that the new class is conjugate to any binary regression model, provided that the link function may be expressed as a scale mixture of Gaussian densities. We discuss in detail the probit and logistic cases. The proposed methodology, based on a straightforward Gibbs sampler algorithm, can be always applied. In particular, in the p > n case, it shows better performances both in terms of mixing and accuracy, compared to the existing methods
    • …
    corecore