Cross Validation Machine Learning Model Predicts More Accurate: A Comparative Study of Heart Disease Using Linear Regression, Support Vector Machine, K Neighbors and Random Forest Models

Abstract

This primary research paper focuses on the utilization of cross-validation, where each iteration of test data is uniquely structured to ensure optimal model performance by combining weak learners for improved model final accuracy. In the machine learning process, data is typically divided into two types of training/tests of 70% and 30% split, and cross-validation for training and evaluation purposes. This research study involves transforming the original datasets and comparative analysis cross-validation using LR, SVM, KNN, and RF methodologies to heart disease classification datasets. The objective is to easily identify the average accuracy of model predictions and subsequently make recommendations for model selection based on both cross-validated increased (15%) more and non-cross-validated approaches. From the comparing each model accuracy scores it is found that the logistic regression and k-nearest neighbor models achieved the highest accuracy of 81% among the four models. Similarly, the random forest model attained an F1 score of 95%, indicating the highest accuracy score. These findings can be further corroborated using learning curve validation. Conversely, the linear regression model exhibited the lowest accuracy of 84% among the four machine learning models

Similar works

Full text

International Journal of Advanced Computer Technology

redirect
Last time updated on 15/05/2024

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.