9,122 research outputs found
Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators
The Italian National Institute for Statistics regularly provides estimates of
unemployment indicators using data from the Labor Force Survey. However, direct
estimates of unemployment incidence cannot be released for Local Labor Market
Areas. These are unplanned domains defined as clusters of municipalities; many
are out-of-sample areas and the majority is characterized by a small sample
size, which render direct estimates inadequate. The Empirical Best Predictor
represents an appropriate, model-based, alternative. However, for non-Gaussian
responses, its computation and the computation of the analytic approximation to
its Mean Squared Error require the solution of (possibly) multiple integrals
that, generally, have not a closed form. To solve the issue, Monte Carlo
methods and parametric bootstrap are common choices, even though the
computational burden is a non trivial task. In this paper, we propose a
Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed
effect model by leaving the distribution of the area-specific random effects
unspecified and estimating it from the observed data. This approach is known to
lead to a discrete mixing distribution which helps avoid unverifiable
parametric assumptions and heavy integral approximations. We also derive a
second-order, bias-corrected, analytic approximation to the corresponding Mean
Squared Error. Finite sample properties of the proposed approach are tested via
a large scale simulation study. Furthermore, the proposal is applied to
unit-level data from the 2012 Italian Labor Force Survey to estimate
unemployment incidence for 611 Local Labor Market Areas using auxiliary
information from administrative registers and the 2011 Census
Dynamic Compressive Sensing of Time-Varying Signals via Approximate Message Passing
In this work the dynamic compressive sensing (CS) problem of recovering
sparse, correlated, time-varying signals from sub-Nyquist, non-adaptive, linear
measurements is explored from a Bayesian perspective. While there has been a
handful of previously proposed Bayesian dynamic CS algorithms in the
literature, the ability to perform inference on high-dimensional problems in a
computationally efficient manner remains elusive. In response, we propose a
probabilistic dynamic CS signal model that captures both amplitude and support
correlation structure, and describe an approximate message passing algorithm
that performs soft signal estimation and support detection with a computational
complexity that is linear in all problem dimensions. The algorithm, DCS-AMP,
can perform either causal filtering or non-causal smoothing, and is capable of
learning model parameters adaptively from the data through an
expectation-maximization learning procedure. We provide numerical evidence that
DCS-AMP performs within 3 dB of oracle bounds on synthetic data under a variety
of operating conditions. We further describe the result of applying DCS-AMP to
two real dynamic CS datasets, as well as a frequency estimation task, to
bolster our claim that DCS-AMP is capable of offering state-of-the-art
performance and speed on real-world high-dimensional problems.Comment: 32 pages, 7 figure
Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership
A method for implicit variable selection in mixture-of-experts frameworks is proposed.
We introduce a prior structure where information is taken from a set of independent
covariates. Robust class membership predictors are identified using a normal gamma
prior. The resulting model setup is used in a finite mixture of Bernoulli distributions
to find homogenous clusters of women in Mozambique based on their information
sources on HIV. Fully Bayesian inference is carried out via the implementation of a
Gibbs sampler
Adaptive Langevin Sampler for Separation of t-Distribution Modelled Astrophysical Maps
We propose to model the image differentials of astrophysical source maps by
Student's t-distribution and to use them in the Bayesian source separation
method as priors. We introduce an efficient Markov Chain Monte Carlo (MCMC)
sampling scheme to unmix the astrophysical sources and describe the derivation
details. In this scheme, we use the Langevin stochastic equation for
transitions, which enables parallel drawing of random samples from the
posterior, and reduces the computation time significantly (by two orders of
magnitude). In addition, Student's t-distribution parameters are updated
throughout the iterations. The results on astrophysical source separation are
assessed with two performance criteria defined in the pixel and the frequency
domains.Comment: 12 pages, 6 figure
Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification
We consider the high energy physics unfolding problem where the goal is to
estimate the spectrum of elementary particles given observations distorted by
the limited resolution of a particle detector. This important statistical
inverse problem arising in data analysis at the Large Hadron Collider at CERN
consists in estimating the intensity function of an indirectly observed Poisson
point process. Unfolding typically proceeds in two steps: one first produces a
regularized point estimate of the unknown intensity and then uses the
variability of this estimator to form frequentist confidence intervals that
quantify the uncertainty of the solution. In this paper, we propose forming the
point estimate using empirical Bayes estimation which enables a data-driven
choice of the regularization strength through marginal maximum likelihood
estimation. Observing that neither Bayesian credible intervals nor standard
bootstrap confidence intervals succeed in achieving good frequentist coverage
in this problem due to the inherent bias of the regularized point estimate, we
introduce an iteratively bias-corrected bootstrap technique for constructing
improved confidence intervals. We show using simulations that this enables us
to achieve nearly nominal frequentist coverage with only a modest increase in
interval length. The proposed methodology is applied to unfolding the boson
invariant mass spectrum as measured in the CMS experiment at the Large Hadron
Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note:
substantial text overlap with arXiv:1401.827
Mixed effect quantile and M-quantile regression for spatial data
Observed data are frequently characterized by a spatial dependence; that is the observed values can be influenced by the "geographical" position. In such a context it is possible to assume that the values observed in a given area are similar to those recorded in neighboring areas. Such data is frequently referred to as spatial data and they are frequently met in epidemiological, environmental and social studies, for a discussion see Haining, (1990). Spatial data can be multilevel, with samples being composed of lower level units (population, buildings) nested within higher level units (census tracts, municipalities, regions) in a geographical area.
Green and Richardson (2002) proposed a general approach to modelling spatial data based on finite mixtures with spatial constraints, where the prior probabilities are modelled through a Markov Random Field (MRF) via a Potts representation (Kindermann and Snell, 1999, Strauss, 1977). This model was defined in a Bayesian context, assuming that the interaction parameter for the Potts model is fixed over the entire analyzed region. Geman and Geman (1984) have shown that this class process can be modelled by a Markov Random Field (MRF). As proved by the Hammersley-Clifford theorem, modelling the process through a MRF is equivalent to using a Gibbs distribution for the membership vector. In other words, the spatial dependence between component indicators is captured by a Gibbs distribution, using a representation similar to the Potts model discussed by Strauss (1977).
In this work, a Gibbs distribution, with a component specific intercept and a constant interaction parameter, as in Green and Richardson (2002), is proposed to model effect of neighboring areas.
This formulation allows to have a parameter specific to each component and a constant spatial dependence in the whole area, extending to quantile and m-quantile regression the proposed by Alfò et al. (2009) who suggested to have both intercept and interaction parameters depending on the mixture component, allowing for different prior probability and varying strength of spatial dependence.
We propose, in the current dissertation to adopt this prior distribution to define a Finite mixture of quantile regression model (FMQRSP) and a Finite mixture of M-quantile regression model (FMMQSP), for spatial data
A stochastic algorithm for probabilistic independent component analysis
The decomposition of a sample of images on a relevant subspace is a recurrent
problem in many different fields from Computer Vision to medical image
analysis. We propose in this paper a new learning principle and implementation
of the generative decomposition model generally known as noisy ICA (for
independent component analysis) based on the SAEM algorithm, which is a
versatile stochastic approximation of the standard EM algorithm. We demonstrate
the applicability of the method on a large range of decomposition models and
illustrate the developments with experimental results on various data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS499 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …