1 research outputs found
Analysis of Machine Learning Based Imputation of Missing Data
Data analysis and classification can be affected by the availability of missing data
in datasets. To deal with missing data, either deletion-based or imputation-based
methods are used that results in the reduction of data records or wrong predicted
value imputed by means/median respectively. A significant improvement can be
done if missing values are imputed more accurately with less computation cost.
In this work, a flow for analysis of machine learning-based algorithms for
missing data imputation is proposed. The K-nearest neighbors (KNN) and
Sequential KNN (SKNN) algorithms are used to impute missing values in
datasets using machine learning. Missing values handled using statistical
deletion approach (List-wise Deletion) and ML-based imputation methods (KNN
and SKNN) is then tested and compared using different ML classifiers (Support
Vector Machine and Decision Tree) to evaluate effectiveness of imputed data.
The used algorithms are compared in terms of accuracy, and results yielded that
the ML-based imputation method (SKNN) outperforms LD-based approach and
KNN method in terms of effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT)