29,468 research outputs found
Bias in parametric estimation: reduction and useful side-effects
The bias of an estimator is defined as the difference of its expected value
from the parameter to be estimated, where the expectation is with respect to
the model. Loosely speaking, small bias reflects the desire that if an
experiment is repeated indefinitely then the average of all the resultant
estimates will be close to the parameter value that is estimated. The current
paper is a review of the still-expanding repository of methods that have been
developed to reduce bias in the estimation of parametric models. The review
provides a unifying framework where all those methods are seen as attempts to
approximate the solution of a simple estimating equation. Of particular focus
is the maximum likelihood estimator, which despite being asymptotically
unbiased under the usual regularity conditions, has finite-sample bias that can
result in significant loss of performance of standard inferential procedures.
An informal comparison of the methods is made revealing some useful practical
side-effects in the estimation of popular models in practice including: i)
shrinkage of the estimators in binomial and multinomial regression models that
guarantees finiteness even in cases of data separation where the maximum
likelihood estimator is infinite, and ii) inferential benefits for models that
require the estimation of dispersion or precision parameters
On the Properties of Simulation-based Estimators in High Dimensions
Considering the increasing size of available data, the need for statistical
methods that control the finite sample bias is growing. This is mainly due to
the frequent settings where the number of variables is large and allowed to
increase with the sample size bringing standard inferential procedures to incur
significant loss in terms of performance. Moreover, the complexity of
statistical models is also increasing thereby entailing important computational
challenges in constructing new estimators or in implementing classical ones. A
trade-off between numerical complexity and statistical properties is often
accepted. However, numerically efficient estimators that are altogether
unbiased, consistent and asymptotically normal in high dimensional problems
would generally be ideal. In this paper, we set a general framework from which
such estimators can easily be derived for wide classes of models. This
framework is based on the concepts that underlie simulation-based estimation
methods such as indirect inference. The approach allows various extensions
compared to previous results as it is adapted to possibly inconsistent
estimators and is applicable to discrete models and/or models with a large
number of parameters. We consider an algorithm, namely the Iterative Bootstrap
(IB), to efficiently compute simulation-based estimators by showing its
convergence properties. Within this framework we also prove the properties of
simulation-based estimators, more specifically the unbiasedness, consistency
and asymptotic normality when the number of parameters is allowed to increase
with the sample size. Therefore, an important implication of the proposed
approach is that it allows to obtain unbiased estimators in finite samples.
Finally, we study this approach when applied to three common models, namely
logistic regression, negative binomial regression and lasso regression
Bias of Maximum-Likelihood estimates in logistic and Cox regression models: A comparative simulation study
Parameter estimates of logistic and Cox regression models are biased for finite samples. In a simulation study we investigated for both models the behaviour of the bias in relation to sample size and further parameters. In the case of a dichotomous explanatory variable x the magnitude of the bias is strongly influenced by the baseline risk defined by the constants of the models and the risk resulting for the high risk group. To conduct a direct comparison of the bias of the two models analyses were based on the same simulated data. Overall, the bias of the two models appear to be similar, however, the Cox model has less bias in situations where the baseline risk is high
Measurement error caused by spatial misalignment in environmental epidemiology
Copyright @ 2009 Gryparis et al - Published by Oxford University Press.In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.This research was supported by NIEHS grants ES012044 (AG, BAC), ES009825 (JS, BAC), ES007142 (CJP), and ES000002 (CJP), and EPA grant R-832416 (JS, BAC)
Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators
The Italian National Institute for Statistics regularly provides estimates of
unemployment indicators using data from the Labor Force Survey. However, direct
estimates of unemployment incidence cannot be released for Local Labor Market
Areas. These are unplanned domains defined as clusters of municipalities; many
are out-of-sample areas and the majority is characterized by a small sample
size, which render direct estimates inadequate. The Empirical Best Predictor
represents an appropriate, model-based, alternative. However, for non-Gaussian
responses, its computation and the computation of the analytic approximation to
its Mean Squared Error require the solution of (possibly) multiple integrals
that, generally, have not a closed form. To solve the issue, Monte Carlo
methods and parametric bootstrap are common choices, even though the
computational burden is a non trivial task. In this paper, we propose a
Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed
effect model by leaving the distribution of the area-specific random effects
unspecified and estimating it from the observed data. This approach is known to
lead to a discrete mixing distribution which helps avoid unverifiable
parametric assumptions and heavy integral approximations. We also derive a
second-order, bias-corrected, analytic approximation to the corresponding Mean
Squared Error. Finite sample properties of the proposed approach are tested via
a large scale simulation study. Furthermore, the proposal is applied to
unit-level data from the 2012 Italian Labor Force Survey to estimate
unemployment incidence for 611 Local Labor Market Areas using auxiliary
information from administrative registers and the 2011 Census
Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods
In assessing prediction accuracy of multivariable prediction models, optimism
corrections are essential for preventing biased results. However, in most
published papers of clinical prediction models, the point estimates of the
prediction accuracy measures are corrected by adequate bootstrap-based
correction methods, but their confidence intervals are not corrected, e.g., the
DeLong's confidence interval is usually used for assessing the C-statistic.
These naive methods do not adjust for the optimism bias and do not account for
statistical variability in the estimation of parameters in the prediction
models. Therefore, their coverage probabilities of the true value of the
prediction accuracy measure can be seriously below the nominal level (e.g.,
95%). In this article, we provide two generic bootstrap methods, namely (1)
location-shifted bootstrap confidence intervals and (2) two-stage bootstrap
confidence intervals, that can be generally applied to the bootstrap-based
optimism correction methods, i.e., the Harrell's bias correction, 0.632, and
0.632+ methods. In addition, they can be widely applied to various methods for
prediction model development involving modern shrinkage methods such as the
ridge and lasso regressions. Through numerical evaluations by simulations, the
proposed confidence intervals showed favourable coverage performances. Besides,
the current standard practices based on the optimism-uncorrected methods showed
serious undercoverage properties. To avoid erroneous results, the
optimism-uncorrected confidence intervals should not be used in practice, and
the adjusted methods are recommended instead. We also developed the R package
predboot for implementing these methods (https://github.com/nomahi/predboot).
The effectiveness of the proposed methods are illustrated via applications to
the GUSTO-I clinical trial
- …