Search CORE

1 research outputs found

Analysis of Machine Learning Based Imputation of Missing Data

Author: Achraf Jabeur Telmoudi
Muhammad Saad Amin
Muhammad Yasir Latif
Nasir Ali Shah
Syed Tahir Hussain Rizvia
Publication venue: Taylor & Francis
Publication date: 01/01/2023
Field of study

Data analysis and classification can be affected by the availability of missing data in datasets. To deal with missing data, either deletion-based or imputation-based methods are used that results in the reduction of data records or wrong predicted value imputed by means/median respectively. A significant improvement can be done if missing values are imputed more accurately with less computation cost. In this work, a flow for analysis of machine learning-based algorithms for missing data imputation is proposed. The K-nearest neighbors (KNN) and Sequential KNN (SKNN) algorithms are used to impute missing values in datasets using machine learning. Missing values handled using statistical deletion approach (List-wise Deletion) and ML-based imputation methods (KNN and SKNN) is then tested and compared using different ML classifiers (Support Vector Machine and Decision Tree) to evaluate effectiveness of imputed data. The used algorithms are compared in terms of accuracy, and results yielded that the ML-based imputation method (SKNN) outperforms LD-based approach and KNN method in terms of effectiveness of handling missing data in almost every dataset with both classification algorithms (SVM and DT)

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)