120 research outputs found

    The analysis of missing data in public use survey databases : a survey of statistical methods.

    Get PDF
    Missing data is very common in survey research. However, currently few guidelines exist with regard to the diagnosis and remedy to missing data in survey research. The goal of this thesis was to investigate properties and effects of three selected missing data techniques (listwise deletion, hot deck imputation, and multiple imputation) via a simulation study, and apply the three methods to address the missing race problem in a real data set extracted from teh National Hospital Discharge Survey. The results of this study showed that multiple imputation and hot deck imputation procedures provided more reliable parameter estimates than did listwise deletion. A similar outcome was observed with respect to the standard errors of the parameter estimates, with the multiple imputation and hot deck imputation producing parameter estimates with smaller standard errors. Multiple imputation outperformed the hot deck imputation by using larger significant levels for variables with missing data and reflecting uncertainty with missing values. In summary, our study showed that employing an appropriate imputation technique to handling missing data in public use surveys is better than ignoring it

    Evidence for a multiple imputation approach to MNAR mechanisms

    Get PDF
    Missing data is a common problem for researchers. Before one can determine the best method to be used in handling missing data, one must first examine why the data is missing. That is, one must identify the missingness mechanism. Failure to discern an ignorable from a nonignorable missingness mechanism can greatly influence parameter estimates, standard errors and create other biases in statistical analyses. This study examined the efficiency and accuracy of a MNAR multivariate imputation by chained equations framework (miceMNAR) model proposed by Galimard et al. (2016). By applying their method to a real dataset (2018 National Survey of Children’s Health) the efficacy of the miceMNAR model was examined. Imputations and parameter estimates using the miceMNAR method were compared to more commonly used methods for handling missing data: complete case analysis and multivariate imputation using chained equations (MICE). Overall, the miceMNAR approach provided very large standard error estimates compared to both complete case analysis and MICE and demonstrated difficulty in providing accurate parameter estimates under MNAR conditions. Further research is recommended on the miceMNAR method before applying it to real data with potential MNAR mechanisms. The results from this study will help inform researchers on potential best practices for dealing with missing data when the mechanism is unknown.Thesis (M.S.

    Type I Error Rates For A One Factor Within-Subjects Design With Missing Values

    Get PDF
    Missing data are a common problem in educational research. A promising technique, that can be implemented in SAS PROC MIXED and is therefore widely available, is to use maximum likelihood to estimate model parameters and base hypothesis tests on these estimates. However, it is not clear which test statistic in PROC MIXED performs better with missing data. The performance of the Hotelling- Lawley-McKeon and Kenward-Roger omnibus test statistics on the means for a single factor withinsubject ANOVA are compared. The results indicate that the Kenward-Roger statistic performed better in terms of keeping the Type I error close to the nominal alpha level

    Evaluation of Modern Missing Data Handling Methods for Coefficient Alpha

    Get PDF
    When assessing a certain characteristic or trait using a multiple item measure, quality of that measure can be assessed by examining the reliability. To avoid multiple time points, reliability can be represented by internal consistency, which is most commonly calculated using Cronbach’s coefficient alpha. Almost every time human participants are involved in research, there is missing data involved. Missing data means that even though complete data were expected to be collected, some data are missing. Missing data can follow different patterns as well as be the result of different mechanisms. One traditional way to deal with missing data is listwise deletion, in which every observation with at least one missing value is discarded. Modern missing data techniques include multiple imputation and maximum likelihood estimation, which use the observed data to create an estimate for the missing values in order to utilize the whole sample size. The present study sought to examine the effect of missing data on coefficient alpha under certain conditions as well as to compare multiple imputation to listwise deletion in its effectiveness to handle missing data across those conditions. The results indicated that coefficient alpha is sensitive to numerous factors in the presence of missing data such as reliability level, sample size, missing data percentage, and missing data mechanism. As expected, there was little difference between listwise deletion and multiple imputation when data were missing completely at random, but multiple imputation performed better when data were missing at random and missing not at random. While listwise deletion always underestimated the true reliability, multiple imputation only underestimated the true reliability when data were missing not at random. Advisor: Rafael De Ayal

    Application of some missing data techniques in estimating missing data in high blood pressure covariates

    Get PDF
    Cases recorded with high blood pressure are a major concern in both public and private hospitals. Adequate provision of health information of patients relating to high blood pressure in Eastern Cape Hospitals hinges so much on the outcome of statistical analysis results. The usual statistical methodologies become inadequate in handling statistical analysis of data collected due to incomplete patients’ information stored in the hospital database. From time to time, new methods are developed to address the problem of missing data. High blood pressure is linked to a lot of diseases such hypertension, cardiovascular disease, kidney disease and stroke. In this study, we developed a new method for addressing the problem of missing data in assessing model used for estimating missing values in terms of minimum errors(using RMSE, MAE, and SE) and goodness-of-fits(using 2 R and adjusted 2 R ) of this model and P-value. . The study compared six different methods: Original data (OD), Listwise deletion (LD), Mean imputations (MEI), Mean above (MA), and Mean above below(MAB) and two steps nearest neighbour (2-NN).The comparison was performed using original data set, and missing values at 5%, 10%, 20%, 30% were simulated on Framingham risk scores under MCAR and MAR simulation on BMI values given some assumptions. Five performance indicators were used to describe the model minimum errors and goodness of fit for all the methods. The results showed that the 2-NN is the best replacement method at lower levels (5% and10%) of missing values while MA and MEI performed best at higher levels(15% and 20%) of missing values. All comparison was based on estimates closest to those of the original data where no value was missing. MAR results showed that 2-NN performed better than LD,MA,MAB, and MEI at 5%,10%, and 20% levels of missing data in terms of absolute difference in p-value to original data

    Handling of Missing Values in Static and Dynamic Data Sets

    Get PDF
    This thesis contributes by first, conducting a comparative study of traditional and modern classifications by highlighting the differences in their performance. Second, an algorithm to enhance the prediction of values to be used for data imputation with nonlinear models is presented. Third, a novel algorithm model selection to enhance prediction performance in the presence of missing data is presented. It includes an overview of nonlinear model selection with complete data, and provides summary descriptions of Box-Tidwell and fractional polynomial methods for model selection. In particular, it focuses on the fractional polynomial method for nonlinear modelling in cases of missing data. An analysis ex- ample is presented to illustrate the performance of this method

    Analyse unvollständiger Befragungsdaten - Multiple Imputation mittels Bayesian Bootstrap Predictive Mean Matching

    Get PDF
    Multiple Imputation (MI) is a general purpose approach to impute partially incomplete data. The proposed method - Bayesian Bootstrap Predictive Mean Matching - is a variant that incorporates the robustifying properties of a nearest neighbour technique (Predictive Mean Matching) into MI.Multiple Imputation (MI) ist ein allgemeiner Ansatz zur Ergänzung fehlender Daten. Die vorgestellte Methode - Bayesian Bootstrap Predictive Mean Matching - ist eine MI-Variante, welche die robustifizierenden Eigenschaften eines Nearest-Neighbour-Verfahrens (Predictive Mean Matching) integriert

    Ignoring the Non-ignorables? Missingness and Missing Positions

    Full text link
    Missing or incomplete data on actors’ positions can cause significant problems in political analysis. Research on missing values suggests the use of multiple imputation methods rather than case deletion, but few studies have yet considered the non-ignorable problem - positions that are hidden for strategic purposes. We examine this problem and discuss the advantages and drawbacks of (i) multiple imputation as implemented in AMELIA; (ii) a computationally easy but, in the context of spatial modelling, straightforward measure of indifference and (iii) a conditional averaging algorithm, LDM, which seeks to reasonably fix actors’ positions in the policy space pre- and post-imputation. The analysis suggests that actors biased by the status quo strategically hide their more supportive positions. Although none of the existing methods - which produce quite different results - is perfectly suited for imputing hidden positions, LDM has the highest hit rate for the conjectured more supportive position

    The case for the use of multiple imputation missing data methods in stochastic frontier analysis with illustration using English local highway data

    Get PDF
    Multiple Imputation (MI) methods have been widely applied in economic applications as a robust statistical way to incorporate data where some observations have missing values for some variables. However in Stochastic Frontier Analysis (SFA), application of these techniques has been sparse and the case for such models has not received attention in the appropriate academic literature. This paper fills this gap and explores the robust properties of MI within the stochastic frontier context. From a methodological perspective, we depart from the standard MI literature by demonstrating, conceptually and through simulation, that it is not appropriate to use imputations of the dependent variable within the SFA modelling, although they can be useful to predict the values of missing explanatory variables. Fundamentally, this is because efficiency analysis involves decomposing a residual into noise and inefficiency and as a result any imputation of a dependent variable would be imputing efficiency based on some concept of average inefficiency in the sample. A further contribution that we discuss and illustrate for the first time in the SFA literature, is that using auxiliary variables (outside of those contained in the SFA model) can enhance the imputations of missing values. Our empirical example neatly articulates that often the source of missing data is only a sub-set of components comprising a part of a composite (or complex) measure and that the other parts that are observed are very useful in predicting the value
    • …
    corecore