6 research outputs found

    Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest

    Get PDF
    Stroke merupakan penyakit yang berpotensi menyebabkan kelumpuhan bahkan kematian. Pada tahun 2022, stroke terdapat 12,2 juta kasus stroke baru yang menambah jumlah total penderita stroke sebesar 101,4 juta. Dari perolehan data maka diperlukan sebuah teknik yang mampu melakukan deteksi pada penyakit tersebut untuk membantu dalam mendeteksi penyakit stroke, dalam hal ini pendekatan machine learning sebagai salah satu solusi yang dapat digunakan untuk melakukan deteksi pada penyakit stroke. Namun sayangnya data yang diperoleh dalam mendeteki penyakit stroke ditemukan adanya imbalance class dalam menangani tidak imbangnya class sehingga dapat mempengaruhi hasil nilai akurasi dalam mendekteksi penyakit stroke, untuk itu dibutuhkan sebuah algoritma random forest dan metode SMOTE dalam menangani imbalance class. Output yang dihasilkan ialah berupa nilai akurasi, presisi, recall, dan f1-score pada algoritma random forest tanpa SMOTE sebesar 0.98, 0.69, 0.51, dan 0.51. Sedangkan algoritma random forest dengan SMOTE mendapatkan masing-masing sebesar 0.91, 0.92, 0.91, 0.91. Terjadi kenaikan signifikan pada presisi, recall, dan f1-score.&nbsp

    SYSTEMATIC LITERATURE REVIEW OF THE CLASS IMBALANCE CHALLENGES IN MACHINE LEARNING

    Get PDF
    The significant growth of data poses its own challenges, both in terms of storing, managing, and analyzing the available data. Untreated and unanalyzed data can only provide limited benefits to its owner. In many cases, the data we analyze is imbalanced. An example of natural data imbalance is in detecting financial fraud, where the number of non-fraudulent transactions is usually much higher than fraudulent ones. This imbalance issue can affect the accuracy and performance of machine learning classification models. Many machine learning classification models tend to learn more general patterns in the majority class. As a result, the model may overlook patterns that exist in the minority class. Various research has been conducted to address the problem of imbalanced data. The objective of this systematic literature review is to provide the latest developments regarding the cases, methods used, and evaluation techniques in handling imbalanced data. This research successfully identifies new methods and is expected to provide more choices for researchers so that imbalanced data can be properly handled, and classification models can produce unbiased, accurate, and consistent results

    Analysing an Imbalanced Stroke Prediction Dataset Using Machine Learning Techniques

    Get PDF
    A stroke is a medical condition characterized by the rupture of blood vessels within the brain which can lead to brain damage. Various symptoms may be exhibited when the brain's supply of blood and essential nutrients is disrupted. To forecast the possibility of brain stroke occurring at an early stage using Machine Learning (ML) and Deep Learning (DL) is the main objective of this study. Timely detection of the various warning signs of a stroke can significantly reduce its severity. This paper performed a comprehensive analysis of features to enhance stroke prediction effectiveness. A reliable dataset for stroke prediction is taken from the Kaggle website to gauge the effectiveness of the proposed algorithm. The dataset has a class imbalance problem which means the total number of negative samples is higher than the total number of positive samples. The results are reported based on a balanced dataset created using oversampling techniques. The proposed work used Smote and Adasyn to handle imbalanced problem for better evaluation metrics. Additionally, the hybrid Neural Network and Random Forest (NN-RF) utilizing the balanced dataset by Adasyn oversampling achieves the highest F1-score of 75% compared to the original unbalanced dataset and other benchmarking algorithms. The proposed algorithm with balanced data utilizing hybrid NN-RF achieves an accuracy of 84%. Advanced ML techniques coupled with thorough data analysis enhance stroke prediction. This study underscores the significance of data-driven methodologies, resulting in improved accuracy and comprehension of stroke risk factors. Applying these methodologies to medical fields can enhance patient care and public health outcomes. By integrating our discoveries, we can enhance the efficiency and effectiveness of the public health system

    Computer vision and machine learning for medical image analysis: recent advances, challenges, and way forward.

    Get PDF
    The recent development in the areas of deep learning and deep convolutional neural networks has significantly progressed and advanced the field of computer vision (CV) and image analysis and understanding. Complex tasks such as classifying and segmenting medical images and localising and recognising objects of interest have become much less challenging. This progress has the potential of accelerating research and deployment of multitudes of medical applications that utilise CV. However, in reality, there are limited practical examples being physically deployed into front-line health facilities. In this paper, we examine the current state of the art in CV as applied to the medical domain. We discuss the main challenges in CV and intelligent data-driven medical applications and suggest future directions to accelerate research, development, and deployment of CV applications in health practices. First, we critically review existing literature in the CV domain that addresses complex vision tasks, including: medical image classification; shape and object recognition from images; and medical segmentation. Second, we present an in-depth discussion of the various challenges that are considered barriers to accelerating research, development, and deployment of intelligent CV methods in real-life medical applications and hospitals. Finally, we conclude by discussing future directions

    Methods to Improve the Prediction Accuracy and Performance of Ensemble Models

    Get PDF
    The application of ensemble predictive models has been an important research area in predicting medical diagnostics, engineering diagnostics, and other related smart devices and related technologies. Most of the current predictive models are complex and not reliable despite numerous efforts in the past by the research community. The performance accuracy of the predictive models have not always been realised due to many factors such as complexity and class imbalance. Therefore there is a need to improve the predictive accuracy of current ensemble models and to enhance their applications and reliability and non-visual predictive tools. The research work presented in this thesis has adopted a pragmatic phased approach to propose and develop new ensemble models using multiple methods and validated the methods through rigorous testing and implementation in different phases. The first phase comprises of empirical investigations on standalone and ensemble algorithms that were carried out to ascertain their performance effects on complexity and simplicity of the classifiers. The second phase comprises of an improved ensemble model based on the integration of Extended Kalman Filter (EKF), Radial Basis Function Network (RBFN) and AdaBoost algorithms. The third phase comprises of an extended model based on early stop concepts, AdaBoost algorithm, and statistical performance of the training samples to minimize overfitting performance of the proposed model. The fourth phase comprises of an enhanced analytical multivariate logistic regression predictive model developed to minimize the complexity and improve prediction accuracy of logistic regression model. To facilitate the practical application of the proposed models; an ensemble non-invasive analytical tool is proposed and developed. The tool links the gap between theoretical concepts and practical application of theories to predict breast cancer survivability. The empirical findings suggested that: (1) increasing the complexity and topology of algorithms does not necessarily lead to a better algorithmic performance, (2) boosting by resampling performs slightly better than boosting by reweighting, (3) the prediction accuracy of the proposed ensemble EKF-RBFN-AdaBoost model performed better than several established ensemble models, (4) the proposed early stopped model converges faster and minimizes overfitting better compare with other models, (5) the proposed multivariate logistic regression concept minimizes the complexity models (6) the performance of the proposed analytical non-invasive tool performed comparatively better than many of the benchmark analytical tools used in predicting breast cancers and diabetics ailments. The research contributions to ensemble practice are: (1) the integration and development of EKF, RBFN and AdaBoost algorithms as an ensemble model, (2) the development and validation of ensemble model based on early stop concepts, AdaBoost, and statistical concepts of the training samples, (3) the development and validation of predictive logistic regression model based on breast cancer, and (4) the development and validation of a non-invasive breast cancer analytic tools based on the proposed and developed predictive models in this thesis. To validate prediction accuracy of ensemble models, in this thesis the proposed models were applied in modelling breast cancer survivability and diabetics’ diagnostic tasks. In comparison with other established models the simulation results of the models showed improved predictive accuracy. The research outlines the benefits of the proposed models, whilst proposes new directions for future work that could further extend and improve the proposed models discussed in this thesis
    corecore