Multiple Imputation Ensembles (MIE) for dealing with missing data

A Farhangfar; AM Sefidian; B Schölkopf; C Cortes; CT Tran; DA Newman; DB Rubin; DB Rubin; DH Wolpert; EL Silva-Ramírez; GE Batista; GJ van der Heijden; H Gao; IH Witten; J Demšar; J Honaker; J Honaker; J Scheffer; JA Sterne; JL Schafer; JL Schafer; JR Quinlan; K Abayomi; KM Ting; L Breiman; L Breiman; L Rokach; M Fichman; M Khalilia; M Spratt; MA Klebanoff; MJ Azur; NJ Horton; PJ García-Laencina; PJ Kelly; PN Tan; RJ Little; S García; S Van Buuren; S Van Buuren; SS Chae; SS Choi; U Garciarena; V Vapnik; X Chen; Y Dong; Y Freund; Y He; Z Che; Z Liu

research

Multiple Imputation Ensembles (MIE) for dealing with missing data

Authors: A Farhangfar
AM Sefidian
B Schölkopf
C Cortes
CT Tran
DA Newman
DB Rubin
DB Rubin
DH Wolpert
EL Silva-Ramírez
GE Batista
GJ van der Heijden
H Gao
IH Witten
J Demšar
J Honaker
J Honaker
J Scheffer
JA Sterne
JL Schafer
JL Schafer
JR Quinlan
K Abayomi
KM Ting
L Breiman
L Breiman
L Rokach
M Fichman
M Khalilia
M Spratt
MA Klebanoff
MJ Azur
NJ Horton
PJ García-Laencina
PJ Kelly
PN Tan
RJ Little
S García
S Van Buuren
S Van Buuren
SS Chae
SS Choi
U Garciarena
V Vapnik
X Chen
Y Dong
Y Freund
Y He
Z Che
Z Liu
Publication date: 1 May 2020
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

University of East Anglia digital repository

oai:ueaeprints.uea.ac.uk:74916

Last time updated on 12/05/2020

Crossref

Last time updated on 27/04/2021