Search CORE

139,960 research outputs found

A note on multiple imputation for method of moments estimation

Author: Kim Jae Kwang
Kim Jae Kwang
Yang Shu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/08/2015
Field of study

Multiple imputation is a popular imputation method for general purpose estimation. Rubin(1987) provided an easily applicable formula for the variance estimation of multiple imputation. However, the validity of the multiple imputation inference requires the congeniality condition of Meng(1994), which is not necessarily satisfied for method of moments estimation. This paper presents the asymptotic bias of Rubin's variance estimator when the method of moments estimator is used as a complete-sample estimator in the multiple imputation procedure. A new variance estimator based on over-imputation is proposed to provide asymptotically valid inference for method of moments estimation.Comment: 8 pages, 0 figur

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

Multiple Imputation Ensembles (MIE) for dealing with missing data

Author: A Farhangfar
AM Sefidian
B Schölkopf
C Cortes
CT Tran
DA Newman
DB Rubin
DB Rubin
DH Wolpert
EL Silva-Ramírez
GE Batista
GJ van der Heijden
H Gao
IH Witten
J Demšar
J Honaker
J Honaker
J Scheffer
JA Sterne
JL Schafer
JL Schafer
JR Quinlan
K Abayomi
KM Ting
L Breiman
L Breiman
L Rokach
M Fichman
M Khalilia
M Spratt
MA Klebanoff
MJ Azur
NJ Horton
PJ García-Laencina
PJ Kelly
PN Tan
RJ Little
S García
S Van Buuren
S Van Buuren
SS Chae
SS Choi
U Garciarena
V Vapnik
X Chen
Y Dong
Y Freund
Y He
Z Che
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

Crossref

University of East Anglia digital repository

Multiple Imputation Using Gaussian Copulas

Author: Bojinov Iavor
Hollenbach Florian M.
Metternich Nils W.
Minhas Shahryar
Minhas Shahryar
Volfovsky Alexander
Ward Michael D.
Publication venue
Publication date: 01/01/2018
Field of study

Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use method for generating multiple imputations using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff, 2007) allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. Multiple imputations are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: MICE and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease-of-use suggest that the approach examined provides an attractive alternative for applied researchers undertaking multiple imputations

arXiv.org e-Print Archive

UCL Discovery

MissForest - nonparametric missing value imputation for mixed-type data

Author: D. J. Stekhoven
Harley
Kurgan
Latal
LITTLE
Oba
P. Buhlmann
Smit
Troyanskaya
van Buuren
Wille
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/09/2011
Field of study

Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a solution to this problem. However, the majority of available imputation methods are restricted to one type of variable only: continuous or categorical. For mixed-type data the different types are usually handled separately. Therefore, these methods ignore possible relations between variable types. We propose a nonparametric method which can cope with different types of variables simultaneously. We compare several state of the art methods for the imputation of missing values. We propose and evaluate an iterative imputation method (missForest) based on a random forest. By averaging over many unpruned classification or regression trees random forest intrinsically constitutes a multiple imputation scheme. Using the built-in out-of-bag error estimates of random forest we are able to estimate the imputation error without the need of a test set. Evaluation is performed on multiple data sets coming from a diverse selection of biological fields with artificially introduced missing values ranging from 10% to 30%. We show that missForest can successfully handle missing values, particularly in data sets including different types of variables. In our comparative study missForest outperforms other methods of imputation especially in data settings where complex interactions and nonlinear relations are suspected. The out-of-bag imputation error estimates of missForest prove to be adequate in all settings. Additionally, missForest exhibits attractive computational efficiency and can cope with high-dimensional data.Comment: Submitted to Oxford Journal's Bioinformatics on 3rd of May 201

arXiv.org e-Print Archive

ETHzürich Repository for Publications and Research Data

Crossref

A Comparison of Price Imputation Methods under Large Samples and Different Levels of Censoring.

Author: Lopez Jose Antonio
Publication venue
Publication date
Field of study

Consumer/Household Economics, Demand and Price Analysis, Research Methods/ Statistical Methods, imputation methods, multiple imputation, censored prices, protein demand, elasticities,

Research Papers in Economics