Effects of Data Imputation Methods on Data Missingness in Data Mining

Abstract

The purpose of this paper is to study theeffectiveness of data imputation methods in dealingwith data missingness in the data mining phase ofknowledge discovery in Database (KDD). Theapplication of data mining techniques without carefulconsideration of missing data can result into biasedresults and skewed conclusions. This research exploresthe impact of data missingness at various levels in KDDmodels employing neural networks as the primary datamining algorithm. Four of the most commonly utilizeddata imputation methods - Case Deletion, MeanSubstitution, Regression Imputation, and MultipleImputation were evalutated using Root Mean Square(RMS) Values, ANOVA Testing, T-tests, and Tukey’sHonestly Significant Difference Test to assess thedifferences of performance levels between variousKnowledge Discovery and Neural Network Models,both in the presence and absence of Missing Data

    Similar works