Search CORE

16,831 research outputs found

Optimal variance estimation without estimating the mean function

Author: Ma Yanyuan
Tong Tiejun
Wang Yuedong
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2013
Field of study

We study the least squares estimator in the residual variance estimation context. We show that the mean squared differences of paired observations are asymptotically normally distributed. We further establish that, by regressing the mean squared differences of these paired observations on the squared distances between paired covariates via a simple least squares procedure, the resulting variance estimator is not only asymptotically normal and root-

n

consistent, but also reaches the optimal bound in terms of estimation variance. We also demonstrate the advantage of the least squares estimator in comparison with existing methods in terms of the second order asymptotic properties.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ432 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

CiteSeerX

Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators

Author: Alfo' Marco
Marino Maria Francesca
Ranalli Maria Giovanna
Salvati Nicola
Publication venue
Publication date: 22/08/2018
Field of study

The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the Labor Force Survey. However, direct estimates of unemployment incidence cannot be released for Local Labor Market Areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas and the majority is characterized by a small sample size, which render direct estimates inadequate. The Empirical Best Predictor represents an appropriate, model-based, alternative. However, for non-Gaussian responses, its computation and the computation of the analytic approximation to its Mean Squared Error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a non trivial task. In this paper, we propose a Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected, analytic approximation to the corresponding Mean Squared Error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 Local Labor Market Areas using auxiliary information from administrative registers and the 2011 Census

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

Integration of survey data and big observational data for finite population inference using mass imputation

Author: Kim Jae Kwang
Yang Shu
Publication venue
Publication date: 08/07/2018
Field of study

Multiple data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we consider an imputation approach to combining a probability sample with big observational data. Unlike the usual imputation for missing data analysis, we create imputed values for the whole elements in the probability sample. Such mass imputation is attractive in the context of survey data integration (Kim and Rao, 2012). We extend mass imputation as a tool for data integration of survey data and big non-survey data. The mass imputation methods and their statistical properties are presented. The matching estimator of Rivers (2007) is also covered as a special case. Variance estimation with mass-imputed data is discussed. The simulation results demonstrate the proposed estimators outperform existing competitors in terms of robustness and efficiency

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Maximum L $q$ -likelihood estimation

Author: Ferrari Davide
Yang Yuhong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In this paper, the maximum L

q

-likelihood estimator (ML

q

E), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35] is introduced. The properties of the ML

q

E are studied via asymptotic analysis and computer simulations. The behavior of the ML

q

E is characterized by the degree of distortion

q

applied to the assumed model. When

q

is properly chosen for small and moderate sample sizes, the ML

q

E can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and

q

tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of ML

q

E is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians

Author: Garrido M. C.
Lopez-de-Teruel P. E.
Ruiz A.
Publication venue: 'AI Access Foundation'
Publication date: 18/05/2011
Field of study

This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications

arXiv.org e-Print Archive

Crossref