16,831 research outputs found
Optimal variance estimation without estimating the mean function
We study the least squares estimator in the residual variance estimation
context. We show that the mean squared differences of paired observations are
asymptotically normally distributed. We further establish that, by regressing
the mean squared differences of these paired observations on the squared
distances between paired covariates via a simple least squares procedure, the
resulting variance estimator is not only asymptotically normal and root-
consistent, but also reaches the optimal bound in terms of estimation variance.
We also demonstrate the advantage of the least squares estimator in comparison
with existing methods in terms of the second order asymptotic properties.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ432 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators
The Italian National Institute for Statistics regularly provides estimates of
unemployment indicators using data from the Labor Force Survey. However, direct
estimates of unemployment incidence cannot be released for Local Labor Market
Areas. These are unplanned domains defined as clusters of municipalities; many
are out-of-sample areas and the majority is characterized by a small sample
size, which render direct estimates inadequate. The Empirical Best Predictor
represents an appropriate, model-based, alternative. However, for non-Gaussian
responses, its computation and the computation of the analytic approximation to
its Mean Squared Error require the solution of (possibly) multiple integrals
that, generally, have not a closed form. To solve the issue, Monte Carlo
methods and parametric bootstrap are common choices, even though the
computational burden is a non trivial task. In this paper, we propose a
Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed
effect model by leaving the distribution of the area-specific random effects
unspecified and estimating it from the observed data. This approach is known to
lead to a discrete mixing distribution which helps avoid unverifiable
parametric assumptions and heavy integral approximations. We also derive a
second-order, bias-corrected, analytic approximation to the corresponding Mean
Squared Error. Finite sample properties of the proposed approach are tested via
a large scale simulation study. Furthermore, the proposal is applied to
unit-level data from the 2012 Italian Labor Force Survey to estimate
unemployment incidence for 611 Local Labor Market Areas using auxiliary
information from administrative registers and the 2011 Census
Integration of survey data and big observational data for finite population inference using mass imputation
Multiple data sources are becoming increasingly available for statistical
analyses in the era of big data. As an important example in finite-population
inference, we consider an imputation approach to combining a probability sample
with big observational data. Unlike the usual imputation for missing data
analysis, we create imputed values for the whole elements in the probability
sample. Such mass imputation is attractive in the context of survey data
integration (Kim and Rao, 2012). We extend mass imputation as a tool for data
integration of survey data and big non-survey data. The mass imputation methods
and their statistical properties are presented. The matching estimator of
Rivers (2007) is also covered as a special case. Variance estimation with
mass-imputed data is discussed. The simulation results demonstrate the proposed
estimators outperform existing competitors in terms of robustness and
efficiency
Maximum L-likelihood estimation
In this paper, the maximum L-likelihood estimator (MLE), a new
parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35]
is introduced. The properties of the MLE are studied via asymptotic analysis
and computer simulations. The behavior of the MLE is characterized by the
degree of distortion applied to the assumed model. When is properly
chosen for small and moderate sample sizes, the MLE can successfully trade
bias for precision, resulting in a substantial reduction of the mean squared
error. When the sample size is large and tends to 1, a necessary and
sufficient condition to ensure a proper asymptotic normality and efficiency of
MLE is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
- …