Decision Trees for the Imputation of Categorical Data

Bankhofer, Udo; Joenssen, Dieter William; Rockel, Tobias

Decision Trees for the Imputation of Categorical Data

Authors: Udo Bankhofer
Dieter William Joenssen
Tobias Rockel
Publication date: 11 April 2017
Publisher: KIT Scientific Publishing, Karlsruhe
Doi

Abstract

Resolving the problem of missing data via imputation can theoretically be done by any prediction model. In the field of machine learning, a well known type of prediction model is a decision tree. However, the literature on how suitable a decision tree is for imputation is still scant to date. Therefore, the aim of this paper is to analyze the imputation quality of decision trees. Furthermore, we present a way to conduct a stochastic imputation using decision trees. We ran a simulation study to compare the deterministic and stochastic imputation approach using decision trees among each other and with other imputation methods. For this study, real datasets and various missing data settings are used. In addition, three different quality criteria are considered. The results of the study indicate that the choice of imputation method should be based on the intended analysis

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

KITopen

oai:EVASTAR-Karlsruhe.de:10000...

Last time updated on 07/05/2019