21 research outputs found

    Deep Neural Network untuk Prediksi Stroke

    Get PDF
    Pada Tahun 2019 Organisasi Kesehatan Dunia (WHO) mendudukkan stroke sebagai tujuh dari sepuluh penyebab utama kematian. Kementerian Kesehatan menggolongkan stroke sebagai penyakit katastropik karena dampaknya luas secara ekonomi dan sosial. Oleh karena itu, diperlukan peran dari teknologi informasi untuk memprediksi stroke guna pencegahan dan perawatan dini. Analisis data yang memiliki kelas tidak seimbang mengakibatkan ketidakakuratan dalam memprediksi stroke. Penelitian ini membandingkan tiga teknik oversampling untuk mendapatkan model prediksi yang lebih baik. Data kelas yang sudah diseimbangkan diuji menggunakan tiga model Arsitektur Deep Neural Network (DNN) dengan melakukan optimasi pada beberapa parameter yaitu optimizer, learning rate dan epoch. Hasil paling baik didapatkan teknik oversampling SMOTETomek dan Arsitektur DNN dengan lima hidden layer, optimasi Adam, learning rate 0.001 dan jumlah epoch 500. Skor akurasi, presisi, recall, dan f1-score masing-masing mendapatkan 0.96, 0.9614, 0.9608 dan 0.9611

    Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

    Get PDF
    We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied to the DT classifier for further model improvement. The results show that Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively

    Comparison of Sampling Methods for Predicting Wine Quality Based on Physicochemical Properties

    Get PDF
    Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem

    Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceImbalanced datasets in supervised learning are considered an ongoing challenging task for standard algorithms, seeing as they are designed to handle balanced class distributions and perform poorly when applied to problems of the imbalanced nature. Many methods have been developed to address this specific problem but the more general approach to achieve a balanced class distribution is data level modification, instead of algorithm modifications. Although class imbalances are responsible for significant losses of performance in standard classifiers in many different types of problems, another aspect that is important to consider is the small disjuncts problem. Therefore, it is important to consider and understand solutions that not only take into the account the between-class imbalance (the imbalance occurring between the two classes) but also the within-class imbalance (the imbalance occurring between the sub-clusters of each class) and to oversample the dataset by rectifying these two types of imbalances simultaneously. It has been shown that cluster-based oversampling is a robust solution that takes into consideration these two problems. This work sets out to study the effect and impact combining different existing oversampling methods with a clustering-based approach. Empirical results of extensive experiments show that the combinations of different oversampling techniques with the clustering algorithm k-means – K-Means Oversampling - improves upon classification results resulting solely from the oversampling techniques with no prior clustering step
    corecore