29,468 research outputs found

    Bias in parametric estimation: reduction and useful side-effects

    Get PDF
    The bias of an estimator is defined as the difference of its expected value from the parameter to be estimated, where the expectation is with respect to the model. Loosely speaking, small bias reflects the desire that if an experiment is repeated indefinitely then the average of all the resultant estimates will be close to the parameter value that is estimated. The current paper is a review of the still-expanding repository of methods that have been developed to reduce bias in the estimation of parametric models. The review provides a unifying framework where all those methods are seen as attempts to approximate the solution of a simple estimating equation. Of particular focus is the maximum likelihood estimator, which despite being asymptotically unbiased under the usual regularity conditions, has finite-sample bias that can result in significant loss of performance of standard inferential procedures. An informal comparison of the methods is made revealing some useful practical side-effects in the estimation of popular models in practice including: i) shrinkage of the estimators in binomial and multinomial regression models that guarantees finiteness even in cases of data separation where the maximum likelihood estimator is infinite, and ii) inferential benefits for models that require the estimation of dispersion or precision parameters

    On the Properties of Simulation-based Estimators in High Dimensions

    Full text link
    Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constructing new estimators or in implementing classical ones. A trade-off between numerical complexity and statistical properties is often accepted. However, numerically efficient estimators that are altogether unbiased, consistent and asymptotically normal in high dimensional problems would generally be ideal. In this paper, we set a general framework from which such estimators can easily be derived for wide classes of models. This framework is based on the concepts that underlie simulation-based estimation methods such as indirect inference. The approach allows various extensions compared to previous results as it is adapted to possibly inconsistent estimators and is applicable to discrete models and/or models with a large number of parameters. We consider an algorithm, namely the Iterative Bootstrap (IB), to efficiently compute simulation-based estimators by showing its convergence properties. Within this framework we also prove the properties of simulation-based estimators, more specifically the unbiasedness, consistency and asymptotic normality when the number of parameters is allowed to increase with the sample size. Therefore, an important implication of the proposed approach is that it allows to obtain unbiased estimators in finite samples. Finally, we study this approach when applied to three common models, namely logistic regression, negative binomial regression and lasso regression

    Bias of Maximum-Likelihood estimates in logistic and Cox regression models: A comparative simulation study

    Get PDF
    Parameter estimates of logistic and Cox regression models are biased for finite samples. In a simulation study we investigated for both models the behaviour of the bias in relation to sample size and further parameters. In the case of a dichotomous explanatory variable x the magnitude of the bias is strongly influenced by the baseline risk defined by the constants of the models and the risk resulting for the high risk group. To conduct a direct comparison of the bias of the two models analyses were based on the same simulated data. Overall, the bias of the two models appear to be similar, however, the Cox model has less bias in situations where the baseline risk is high

    Measurement error caused by spatial misalignment in environmental epidemiology

    Get PDF
    Copyright @ 2009 Gryparis et al - Published by Oxford University Press.In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.This research was supported by NIEHS grants ES012044 (AG, BAC), ES009825 (JS, BAC), ES007142 (CJP), and ES000002 (CJP), and EPA grant R-832416 (JS, BAC)

    Semi-Parametric Empirical Best Prediction for small area estimation of unemployment indicators

    Full text link
    The Italian National Institute for Statistics regularly provides estimates of unemployment indicators using data from the Labor Force Survey. However, direct estimates of unemployment incidence cannot be released for Local Labor Market Areas. These are unplanned domains defined as clusters of municipalities; many are out-of-sample areas and the majority is characterized by a small sample size, which render direct estimates inadequate. The Empirical Best Predictor represents an appropriate, model-based, alternative. However, for non-Gaussian responses, its computation and the computation of the analytic approximation to its Mean Squared Error require the solution of (possibly) multiple integrals that, generally, have not a closed form. To solve the issue, Monte Carlo methods and parametric bootstrap are common choices, even though the computational burden is a non trivial task. In this paper, we propose a Semi-Parametric Empirical Best Predictor for a (possibly) non-linear mixed effect model by leaving the distribution of the area-specific random effects unspecified and estimating it from the observed data. This approach is known to lead to a discrete mixing distribution which helps avoid unverifiable parametric assumptions and heavy integral approximations. We also derive a second-order, bias-corrected, analytic approximation to the corresponding Mean Squared Error. Finite sample properties of the proposed approach are tested via a large scale simulation study. Furthermore, the proposal is applied to unit-level data from the 2012 Italian Labor Force Survey to estimate unemployment incidence for 611 Local Labor Market Areas using auxiliary information from administrative registers and the 2011 Census

    Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods

    Full text link
    In assessing prediction accuracy of multivariable prediction models, optimism corrections are essential for preventing biased results. However, in most published papers of clinical prediction models, the point estimates of the prediction accuracy measures are corrected by adequate bootstrap-based correction methods, but their confidence intervals are not corrected, e.g., the DeLong's confidence interval is usually used for assessing the C-statistic. These naive methods do not adjust for the optimism bias and do not account for statistical variability in the estimation of parameters in the prediction models. Therefore, their coverage probabilities of the true value of the prediction accuracy measure can be seriously below the nominal level (e.g., 95%). In this article, we provide two generic bootstrap methods, namely (1) location-shifted bootstrap confidence intervals and (2) two-stage bootstrap confidence intervals, that can be generally applied to the bootstrap-based optimism correction methods, i.e., the Harrell's bias correction, 0.632, and 0.632+ methods. In addition, they can be widely applied to various methods for prediction model development involving modern shrinkage methods such as the ridge and lasso regressions. Through numerical evaluations by simulations, the proposed confidence intervals showed favourable coverage performances. Besides, the current standard practices based on the optimism-uncorrected methods showed serious undercoverage properties. To avoid erroneous results, the optimism-uncorrected confidence intervals should not be used in practice, and the adjusted methods are recommended instead. We also developed the R package predboot for implementing these methods (https://github.com/nomahi/predboot). The effectiveness of the proposed methods are illustrated via applications to the GUSTO-I clinical trial
    corecore