116 research outputs found
A hierarchical Bayesian approach to record linkage and population size problems
We propose and illustrate a hierarchical Bayesian approach for matching
statistical records observed on different occasions. We show how this model can
be profitably adopted both in record linkage problems and in capture--recapture
setups, where the size of a finite population is the real object of interest.
There are at least two important differences between the proposed model-based
approach and the current practice in record linkage. First, the statistical
model is built up on the actually observed categorical variables and no
reduction (to 0--1 comparisons) of the available information takes place.
Second, the hierarchical structure of the model allows a two-way propagation of
the uncertainty between the parameter estimation step and the matching
procedure so that no plug-in estimates are used and the correct uncertainty is
accounted for both in estimating the population size and in performing the
record linkage. We illustrate and motivate our proposal through a real data
example and simulations.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS447 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Approximate Bayesian inference in semiparametric copula models
We describe a simple method for making inference on a functional of a
multivariate distribution. The method is based on a copula representation of
the multivariate distribution and it is based on the properties of an
Approximate Bayesian Monte Carlo algorithm, where the proposed values of the
functional of interest are weighed in terms of their empirical likelihood. This
method is particularly useful when the "true" likelihood function associated
with the working model is too costly to evaluate or when the working model is
only partially specified.Comment: 27 pages, 18 figure
Approximate Integrated Likelihood via ABC methods
We propose a novel use of a recent new computational tool for Bayesian
inference, namely the Approximate Bayesian Computation (ABC) methodology. ABC
is a way to handle models for which the likelihood function may be intractable
or even unavailable and/or too costly to evaluate; in particular, we consider
the problem of eliminating the nuisance parameters from a complex statistical
model in order to produce a likelihood function depending on the quantity of
interest only. Given a proper prior for the entire vector parameter, we propose
to approximate the integrated likelihood by the ratio of kernel estimators of
the marginal posterior and prior for the quantity of interest. We present
several examples.Comment: 28 pages, 8 figure
Objective Bayesian analysis for the multivariate skew-t model
We perform a Bayesian analysis of the p-variate skew-t model, providing a new
parameterization, a set of non-informative priors and a sampler specifically
designed to explore the posterior density of the model parameters. Extensions,
such as the multivariate regression model with skewed errors and the stochastic
frontiers model, are easily accommodated. A novelty introduced in the paper is
given by the extension of the bivariate skew-normal model given in Liseo &
Parisi (2013) to a more realistic p-variate skew-t model. We also introduce the
R package mvst, which allows to estimate the multivariate skew-t model
Bayesian inference for the multivariate skew-normal model: a Population Monte Carlo approach
Frequentist and likelihood methods of inference based on the multivariate
skew-normal model encounter several technical difficulties with this model. In
spite of the popularity of this class of densities, there are no broadly
satisfactory solutions for estimation and testing problems. A general
population Monte Carlo algorithm is proposed which: 1) exploits the latent
structure stochastic representation of skew-normal random variables to provide
a full Bayesian analysis of the model and 2) accounts for the presence of
constraints in the parameter space. The proposed approach can be defined as
weakly informative, since the prior distribution approximates the actual
reference prior for the shape parameter vector. Results are compared with the
existing classical solutions and the practical implementation of the algorithm
is illustrated via a simulation study and a real data example. A generalization
to the matrix variate regression model with skew-normal error is also
presented
Comment on Article by Berger, Bernardo, and Sun
Discussion of Overall Objective Priors by James O. Berger, Jose M. Bernardo,
Dongchu Sun [arXiv:1504.02689].Comment: Published at http://dx.doi.org/10.1214/14-BA938 in the Bayesian
Analysis (http://projecteuclid.org/euclid.ba) by the International Society of
Bayesian Analysis (http://bayesian.org/
Computational aspects of Bayesian spectral density estimation
Gaussian time-series models are often specified through their spectral
density. Such models present several computational challenges, in particular
because of the non-sparse nature of the covariance matrix. We derive a fast
approximation of the likelihood for such models. We propose to sample from the
approximate posterior (that is, the prior times the approximate likelihood),
and then to recover the exact posterior through importance sampling. We show
that the variance of the importance sampling weights vanishes as the sample
size goes to infinity. We explain why the approximate posterior may typically
multi-modal, and we derive a Sequential Monte Carlo sampler based on an
annealing sequence in order to sample from that target distribution.
Performance of the overall approach is evaluated on simulated and real
datasets. In addition, for one real world dataset, we provide some numerical
evidence that a Bayesian approach to semi-parametric estimation of spectral
density may provide more reasonable results than its Frequentist counter-parts
Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process
A stationary Gaussian process is said to be long-range dependent (resp.,
anti-persistent) if its spectral density can be written as
, where (resp., ),
and is continuous and positive. We propose a novel Bayesian nonparametric
approach for the estimation of the spectral density of such processes. We prove
posterior consistency for both and , under appropriate conditions on the
prior distribution. We establish the rate of convergence for a general class of
priors and apply our results to the family of fractionally exponential priors.
Our approach is based on the true likelihood and does not resort to Whittle's
approximation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS955 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A unified framework for de-duplication and population size estimation (with Discussion)
Data de-duplication is the process of finding records in one or more datasets belonging to the same entity. In this paper we tackle the de-duplication process via a latent entity model, where the observed data are perturbed versions of a set of key variables drawn from a finite population of different entities. The main novelty of our approach is to consider the population size as an unknown model parameter. As a result, one salient feature of the proposed method is the capability of the model to account for the de-duplication uncertainty in the population size estimation. As by-products of our approach we illustrate the relationships between de-duplication problems and capture-recapture models and we obtain a more adequate prior distribution on the linkage structure. Moreover we propose a novel simulation algorithm for the posterior distribution of the matching configuration based on the marginalization of the key variables at the population level. We apply our approach to two synthetic data sets comprising German names. In addition we illustrate a real data application matching records from two lists reporting victims killed in the recent Syrian conflict
An extension of the Unified Skew-Normal family of distributions and application to Bayesian binary regression
We consider the general problem of Bayesian binary regression with a large
number of covariates. We introduce a new class of distributions, the Perturbed
Unified Skew Normal (PSUN), which generalizes the SUN class and we show that
the new class is conjugate to any binary regression model, provided that the
link function may be expressed as a scale mixture of Gaussian densities. We
discuss in detail the probit and logistic cases. The proposed methodology,
based on a straightforward Gibbs sampler algorithm, can be always applied. In
particular, in the p > n case, it shows better performances both in terms of
mixing and accuracy, compared to the existing methods
- …