9,122 research outputs found

    Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators

    Full text link
    The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the Labor Force Survey. However, direct estimates of unemployment incidence cannot be released for Local Labor Market Areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas and the majority is characterized by a small sample size, which render direct estimates inadequate. The Empirical Best Predictor represents an appropriate, model-based, alternative. However, for non-Gaussian responses, its computation and the computation of the analytic approximation to its Mean Squared Error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a non trivial task. In this paper, we propose a Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected, analytic approximation to the corresponding Mean Squared Error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 Local Labor Market Areas using auxiliary information from administrative registers and the 2011 Census

    Dynamic Compressive Sensing of Time-Varying Signals via Approximate Message Passing

    Full text link
    In this work the dynamic compressive sensing (CS) problem of recovering sparse, correlated, time-varying signals from sub-Nyquist, non-adaptive, linear measurements is explored from a Bayesian perspective. While there has been a handful of previously proposed Bayesian dynamic CS algorithms in the literature, the ability to perform inference on high-dimensional problems in a computationally efficient manner remains elusive. In response, we propose a probabilistic dynamic CS signal model that captures both amplitude and support correlation structure, and describe an approximate message passing algorithm that performs soft signal estimation and support detection with a computational complexity that is linear in all problem dimensions. The algorithm, DCS-AMP, can perform either causal filtering or non-causal smoothing, and is capable of learning model parameters adaptively from the data through an expectation-maximization learning procedure. We provide numerical evidence that DCS-AMP performs within 3 dB of oracle bounds on synthetic data under a variety of operating conditions. We further describe the result of applying DCS-AMP to two real dynamic CS datasets, as well as a frequency estimation task, to bolster our claim that DCS-AMP is capable of offering state-of-the-art performance and speed on real-world high-dimensional problems.Comment: 32 pages, 7 figure

    Bayesian shrinkage in mixture-of-experts models: identifying robust determinants of class membership

    Get PDF
    A method for implicit variable selection in mixture-of-experts frameworks is proposed. We introduce a prior structure where information is taken from a set of independent covariates. Robust class membership predictors are identified using a normal gamma prior. The resulting model setup is used in a finite mixture of Bernoulli distributions to find homogenous clusters of women in Mozambique based on their information sources on HIV. Fully Bayesian inference is carried out via the implementation of a Gibbs sampler

    Adaptive Langevin Sampler for Separation of t-Distribution Modelled Astrophysical Maps

    Full text link
    We propose to model the image differentials of astrophysical source maps by Student's t-distribution and to use them in the Bayesian source separation method as priors. We introduce an efficient Markov Chain Monte Carlo (MCMC) sampling scheme to unmix the astrophysical sources and describe the derivation details. In this scheme, we use the Langevin stochastic equation for transitions, which enables parallel drawing of random samples from the posterior, and reduces the computation time significantly (by two orders of magnitude). In addition, Student's t-distribution parameters are updated throughout the iterations. The results on astrophysical source separation are assessed with two performance criteria defined in the pixel and the frequency domains.Comment: 12 pages, 6 figure

    Statistical unfolding of elementary particle spectra: Empirical Bayes estimation and bias-corrected uncertainty quantification

    Full text link
    We consider the high energy physics unfolding problem where the goal is to estimate the spectrum of elementary particles given observations distorted by the limited resolution of a particle detector. This important statistical inverse problem arising in data analysis at the Large Hadron Collider at CERN consists in estimating the intensity function of an indirectly observed Poisson point process. Unfolding typically proceeds in two steps: one first produces a regularized point estimate of the unknown intensity and then uses the variability of this estimator to form frequentist confidence intervals that quantify the uncertainty of the solution. In this paper, we propose forming the point estimate using empirical Bayes estimation which enables a data-driven choice of the regularization strength through marginal maximum likelihood estimation. Observing that neither Bayesian credible intervals nor standard bootstrap confidence intervals succeed in achieving good frequentist coverage in this problem due to the inherent bias of the regularized point estimate, we introduce an iteratively bias-corrected bootstrap technique for constructing improved confidence intervals. We show using simulations that this enables us to achieve nearly nominal frequentist coverage with only a modest increase in interval length. The proposed methodology is applied to unfolding the ZZ boson invariant mass spectrum as measured in the CMS experiment at the Large Hadron Collider.Comment: Published at http://dx.doi.org/10.1214/15-AOAS857 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: substantial text overlap with arXiv:1401.827

    Mixed effect quantile and M-quantile regression for spatial data

    Get PDF
    Observed data are frequently characterized by a spatial dependence; that is the observed values can be influenced by the "geographical" position. In such a context it is possible to assume that the values observed in a given area are similar to those recorded in neighboring areas. Such data is frequently referred to as spatial data and they are frequently met in epidemiological, environmental and social studies, for a discussion see Haining, (1990). Spatial data can be multilevel, with samples being composed of lower level units (population, buildings) nested within higher level units (census tracts, municipalities, regions) in a geographical area. Green and Richardson (2002) proposed a general approach to modelling spatial data based on finite mixtures with spatial constraints, where the prior probabilities are modelled through a Markov Random Field (MRF) via a Potts representation (Kindermann and Snell, 1999, Strauss, 1977). This model was defined in a Bayesian context, assuming that the interaction parameter for the Potts model is fixed over the entire analyzed region. Geman and Geman (1984) have shown that this class process can be modelled by a Markov Random Field (MRF). As proved by the Hammersley-Clifford theorem, modelling the process through a MRF is equivalent to using a Gibbs distribution for the membership vector. In other words, the spatial dependence between component indicators is captured by a Gibbs distribution, using a representation similar to the Potts model discussed by Strauss (1977). In this work, a Gibbs distribution, with a component specific intercept and a constant interaction parameter, as in Green and Richardson (2002), is proposed to model effect of neighboring areas. This formulation allows to have a parameter specific to each component and a constant spatial dependence in the whole area, extending to quantile and m-quantile regression the proposed by Alfò et al. (2009) who suggested to have both intercept and interaction parameters depending on the mixture component, allowing for different prior probability and varying strength of spatial dependence. We propose, in the current dissertation to adopt this prior distribution to define a Finite mixture of quantile regression model (FMQRSP) and a Finite mixture of M-quantile regression model (FMMQSP), for spatial data

    A stochastic algorithm for probabilistic independent component analysis

    Full text link
    The decomposition of a sample of images on a relevant subspace is a recurrent problem in many different fields from Computer Vision to medical image analysis. We propose in this paper a new learning principle and implementation of the generative decomposition model generally known as noisy ICA (for independent component analysis) based on the SAEM algorithm, which is a versatile stochastic approximation of the standard EM algorithm. We demonstrate the applicability of the method on a large range of decomposition models and illustrate the developments with experimental results on various data sets.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS499 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore