15 research outputs found
An empirical study on machine learning algorithms for heart disease prediction
In recent years, machine learning is attaining higher precision and accuracy in clinical heart disease dataset classification. However, literature shows that the quality of heart disease feature used for the training model has a significant impact on the outcome of the predictive model. Thus, this study focuses on exploring the impact of the quality of heart disease features on the performance of the machine learning model on heart disease prediction by employing recursive feature elimination with cross-validation (RFECV). Furthermore, the study explores heart disease features with a significant effect on model output. The dataset for experimentation is obtained from the University of California Irvine (UCI) machine learning dataset. The experiment is implemented using a support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF) are employed. The performance of the SVM, LR, DT, and RF models. The result appears to prove that the quality of the feature significantly affects the performance of the model. Overall, the experiment proves that RF outperforms as compared to other algorithms. In conclusion, the predictive accuracy of 99.7% is achieved with RF
A hybrid approach to medical decision-making: diagnosis of heart disease with machine-learning model
Heart disease is one of the most widely spreading and deadliest diseases across the world. In this study, we have proposed hybrid model for heart disease prediction by employing random forest and support vector machine. With random forest, iterative feature elimination is carried out to select heart disease features that improves predictive outcome of support vector machine for heart disease prediction. Experiment is conducted on the proposed model using test set and the experimental result evidently appears to prove that the performance of the proposed hybrid model is better as compared to an individual random forest and support vector machine. Overall, we have developed more accurate and computationally efficient model for heart disease prediction with accuracy of 98.3%. Moreover, experiment is conducted to analyze the effect of regularization parameter (C) and gamma on the performance of support vector machine. The experimental result evidently reveals that support vector machine is very sensitive to C and gamma
Exploring the performance of feature selection method using breast cancer dataset
Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X2) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X2) statistics and embedded feature selection
Early prediction of chronic heart disease with recursive feature elimination and supervised learning techniques
Chronic heart disease (CHD) is a common complication among patients suffering in the cardiological intensive care unit, often resulting in poor prognosis and high mortality. Early prediction of CHD can reduce mortality by preventing the severity of the disease. This study evaluated the efficacy of on recursive feature elimination for predicting CHD using supervised learning techniques for predicting CHD. The study employed 1190 Cleveland Hungarian CHD dataset. Different supervised learning techniques (support vector machine, decision tree, k-nearest neighbor, Naive Bayes, stochastic gradient descent, adaptive boosting, and multilayer perceptron) were used to study the efficacy of the recursive feature elimination. Chest pain type, sex, blood sugar level, angina, depression, and slope were associated with CHD occurrence. The accuracy of the K-nearest neighbor and decision tree model was 89.91% for the feature-selected dataset indicating good predictive ability. Ultimately, the support vector machine and logistic regression with the selected features exhibited good discriminatory ability for early prediction of CHD. Thus, the recursive feature elimination is a good approach to develop a a model with higher accuracy to predict CHD
Scalability and performance of decision tree for cardiovascular disease prediction
As one of the most common types of disease, cardiovascular disease is a serious health concern worldwide. Early detection is crucial for successful treatment and improved survival rates. The decision tree is a robust classifier for predicting the risk of cardiovascular disease and getting insights that would assist in making clinical decisions. However, selecting a better model for cardiovascular disease could be challenging due to scalability issues. Hence, this study examines the scalability and performance of decision trees for cardiovascular disease prediction. The study evaluated the performance of a decision tree for predicting cardiovascular disease. The performance evaluation was carried out by employing a confusion matrix, cross-validation score, model complexity, and training score for varying sizes of training samples. The experiment depicted that, the decision tree model was 88.8% accurate in predicting the presence or absence of cardiovascular disease. Therefore, the implementation of the decision tree is beneficial for the prediction and early detection of heart disease events in patients
An opinionated sentiment analysis using a rule-based method
The categorization of opinions into positive, negative, or neutral facilitates information gathering, pinpointing individual weaknesses, and streamlining the decision-making process. Precision in opinion classification enables decision-makers to extract valuable insights, make well-informed decisions, and execute suitable actions. Sentiment analysis is language-specific due to the distinct morphological structures unique to each language, distinguishing them from one another. This study implemented a rule-based sentiment analysis approach for Kafi-noonoo opinionated texts, leveraging a rule-based system tailored for smaller datasets that operate based on a predefined set of rules. The rule-based mechanism calculates the overall polarity of a given sentence by applying a set of rules and categorizes it into positive, negative, or neutral sentiments upon identifying sentimental terms from a dedicated file. While the analysis utilized 1,500 words sourced from Facebook and music review samples, the modest sample size yielded satisfactory results. Performance evaluation metrics such as precision, recall, and F-measure were employed, indicating positive word scores of 91%, 86%, and 88.4%, and negative word scores of 80%, 75%, and 77%, respectively
Consistency, local stability, and approximation of Shapash explanation
Consistency, scalability, and local stability properties ensure that a model or method produces reliable and predictable outcomes. The Shapash helps users understand how the model makes its decisions. With machine learning (ML) system, healthcare experts can identify individuals at higher risk and implement interventions to reduce the occurrence and severity of disease. ML had achieved higher prediction accuracy even though the accuracy of their prediction depends on the quality and quantity of the data used for training. Despite the wider application and higher accuracy of different ML for disease prediction, the explanation of their predictive outcome is much more important to the healthcare professional, the patient, and even their developers. However, most of the ML systems do not explain their outcomes. To address the explainability issue various techniques such as local model agnostic explanation (LIME), and shapley additive explanation (SHAP) have been proposed over the recent years. Furthermore, the consistency, local stability, and approximation of the explanation remained one of the research topics in ML. This study investigated the consistency, stability, and approximation of LIME and SHAP in predicting heart disease (HD). The result suggested that LIME and SHAP generated a similar explanation (distance=0.35), compared to the active coalition of variable (ACV) explanation (distance=0.43)
Amharic event text classification from social media using hybrid deep learning
This study aims to develop a hybrid deep-learning model for detecting and classifying Amharic text. Various natural language applications, such as information extraction, event extraction, conversation, text summarization, and require an automatic event classification. However, existing studies focused on classification, giving little attention to the preprocessing and feature extraction techniques. To address this problem, this work proposed a hybridized deep learning-based Amharic social media text event classification model. The model consists of word-to-vector (Word2vecv) word embedding techniques to capture the semantic and syntactic representation. Convolutional neural network (CNN) is used to extract short-length text features. Additionally, bidirectional long-short memory (Bi-LSTM) is used to extract features from long Amharic sentences and classify those events based on their classes. The dataset used for training and testing consists of 6,740 labeled Amharic text sentences, collected from social media. The result shows an accuracy of 94.8% in detecting and classifying Amharic text events
An empirical study on machine learning algorithms for heart disease prediction
In recent years, machine learning is attaining higher precision and accuracy in clinical heart disease dataset classification. However, literature shows that the quality of heart disease feature used for the training model has a significant impact on the outcome of the predictive model. Thus, this study focuses on exploring the impact of the quality of heart disease features on the performance of the machine learning model on heart disease prediction by employing recursive feature elimination with cross-validation (RFECV). Furthermore, the study explores heart disease features with a significant effect on model output. The dataset for experimentation is obtained from the University of California Irvine (UCI) machine learning dataset. The experiment is implemented using a support vector machine (SVM), logistic regression (LR), decision tree (DT), and random forest (RF) are employed. The performance of the SVM, LR, DT, and RF models. The result appears to prove that the quality of the feature significantly affects the performance of the model. Overall, the experiment proves that RF outperforms as compared to other algorithms. In conclusion, the predictive accuracy of 99.7% is achieved with RF.</jats:p
