37 research outputs found
CLASSIFICATION OF CORN PLANT DISEASES USING VARIOUS CONVOLUTIONAL NEURAL NETWORK
Based on data from the East Java Badan Pusat Statistik (BPS) in 2020, corn production in 2019 decreased by 622,403 tons. The decrease in production was caused by a disease that attacked corn plants identified from the corn leaves' physical appearance. This study aims to obtain an architectural model with good performance between AlexNet, LeNet, and MobileNet in detecting diseases of maize plants. The dataset used in this study came from Kaggle, with 4188 images divided into four disease classes: Common Rust, Gray Leaf Spot, Blight, and Healthy. Agricultural experts from Bantul have confirmed the appearance of each class of corn plant diseases. The preprocessing process is carried out to prepare the data so that the amount of data for each class is balanced. The image data used in this study totaled 4000 images which were divided into training data and testing data with a ratio of 80:20. Based on the experimental results, it was found that the MobileNet architecture has better performance than AlexNet and LeNet with an accuracy value of 83.37%, average precision of 0.8337, and g-mean of 0.8298. These results have been validated by agricultural experts in Bantul Regency and corn farmers experienced in corn farming
The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance
Class imbalance is a condition where the amount of data in the minority class is smaller than that of the majority class. The impact of the class imbalance in the dataset is the occurrence of minority class misclassification, so it can affect classification performance. Various approaches have been taken to deal with the problem of class imbalances such as the data level approach, algorithmic level approach, and cost-sensitive learning. At the data level, one of the methods used is to apply the sampling method. In this study, the ADASYN, SMOTE, and SMOTE-ENN sampling methods were used to deal with the problem of class imbalance combined with the AdaBoost, K-Nearest Neighbor, and Random Forest classification algorithms. The purpose of this study was to determine the effect of handling class imbalances on the dataset on classification performance. The tests were carried out on five datasets and based on the results of the classification the integration of the ADASYN and Random Forest methods gave better results compared to other model schemes. The criteria used to evaluate include accuracy, precision, true positive rate, true negative rate, and g-mean score. The results of the classification of the integration of the ADASYN and Random Forest methods gave 5% to 10% better than other models
Investigating The Effectiveness of Various Convolutional Neural Network Model Architectures for Skin Cancer Melanoma Classification
Melanoma is one of the most dangerous types of skin cancer. Since 2018, the number of skin cancer cases in the US has increased and exceeded 100,000. Melanoma is the third most common cancer in Indonesia, following womb cancer and breast cancer. Standard detection of melanoma skin cancer biopsy is costly and time-consuming. The purpose of this research is to build and compare melanoma skin cancer detection using various Convolutional Neural Network method. This research used four CNN model architectures methods, VGG-16, LeNet, Xception, and MobileNet. The dataset for this research is image data that consists of 9605 data divided into benign and malignant classes. The data will be augmented to increase its quantity. After that, the data will be trained using four CNN architecture models and evaluated using the confusion matrix. The result of this study is that Xception model has the best accuracy and the lowest loss, with 93% accuracy and 19% loss, with precision 93%, recall 93,5%, and f1-score 93%. Whereas the other model, VGG-16 gives 90 % accuracy, 27% loss, LeNet 89,7% accuracy, 28% loss, and mobileNet 90,8% accuracy and 22,5% loss
PENERAPAN METODE ENSEMBLE UNTUK MENINGKATKAN KINERJA ALGORITME KLASIFIKASI PADA IMBALANCED DATASET
Pada bidang data mining sering kali para peneliti tidak memperhatikan keseimbangan distribusi kelas pada dataset. Hal ini dapat menimbulkan kesulitan yang cukup serius pada algoritme klasifikasi. karena secara teori mayoritas classifier mengasumsikan distribusi yang relatif seimbang, sehingga menyebabkan kinerja suatu algoritme klasifikasi menjadi kurang maksimal. Oleh karena itu, pada penelitian ini diterapkan metode ensemble dengan penambahan adaptive boosting untuk menyelesaikan permasalahan tersebut. Dari hasil pengujian yang dilakukan pada penelitian ini, metode ensemble dengan penambahan adaptive boosting dapat meningkatkan nilai kinerja algoritme klasifikasi. Nilai kinerja algoritme Naive Bayes dengan Adaptive Boosting akurasi yang dihasilkan sebesar 91.98%, sensitifitas sebesar 91.98%, spesifisitas sebesar 96.49%, dan g-mean sebesar 94.21%. Nilai kinerja algoritme Support Vector Machine dengan Adaptive Boosting akurasi yang dihasilkan sebesar 91.52%, sensitifitas sebesar 91.52%, spesifisitas sebesar 96.29%, dan g-mean sebesar 93.88%. Sedangkan Nilai kinerja algoritme Decision Tree dengan Adaptive Boosting akurasi yang dihasilkan sebesar 94.37%, sensitifitas sebesar 94.37%, spesifisitas sebesar 97.73%, dan g-mean sebesar 96.03%. Hal ini menunjukkan bahwa metode ensemble dengan Adaptive Boosting dapat menjadi solusi untuk meningkatkan kinerja algoritme pada imbalanced dataset.Kata Kunci: adaptive boosting, data mining, ensemble, ketidakseimbangan kelas, klasifikasi
Analisis Sentimen Pengguna Twitter Terhadap Layanan Internet Provider Menggunakan Algoritma Support Vector Machine
Media sosial saat ini merupakan media komunikasi yang sering digunakan oleh kalangan masyarakat Indonesia dalam menyampaikan sebuah opini. Salah satu media yang sering digunakan masyarat adalah twitter. Twitter merupakan media sosial yang memberikan banyak informasi melalui tweet, dari informasi yang ditulis tersebut terdapat data yang dapat diolah. Penelitian ini menggunakan teknik text mining dengan menerapkan algoritma Support Vector Machine dipergunakan untuk klasifikasi sentimen pengguna twitter terhadap layanan internet Biznet. Kernel yang digunakan adalah kernel Linear dan kernel RBF. Pengujian dilakukan dengan 3 skenario, pada skenario 1 menggunakan 800 data, skenario 2 menggunakan 900 data dan skenario 3 menggunakan 1000 data, untuk pembagiannya yaitu 90% data training dan 10% data testing dari masing-masing skenario. Berdasarkan hasil pengujian yang dilakukan menggunakan kernel linear dan kernel RBF dapat diambil kesimpulan sebagai berikut. Algoritma SVM menggunakan dengan kernel linear maupun kernel RBF memiliki hasil kinerja evaluasi baik dari sisi akurasi, presisi dan recall yang relatif sama. Sehingga dapat dikatakan bahwa algoritma SVM baik dengan kernel RBF maupun Linear sama sama dapat digunakan dengan baik dalam menentukan sentimen pengguna internet Biznet. Selain itu dengan 3 skenario pengujian dengan jumlah data yang berbeda algoritma SVM baik dengan kernel RBF maupun Linear sama sama konsisten kinerjanya.
 
Evaluation of the Decision Tree Model for Air Condition Classification on the Global Air Pollution Dataset
Air pollution is an urgent global environmental problem, with significant impacts on public health and ecosystem stability. This research aims to develop an air quality classification model using the Global Air Pollution dataset from Kaggle, which consists of 23,463 rows of data and 12 features, including important variables such as Air Quality Index (AQI), PM2.5, NO2, and O3. Decision Tree, Random Forest, and Support Vector Machine (SVM) algorithms are applied to perform classification, with a focus on hyperparameter tuning to increase model accuracy. The research results show that the Decision Tree provides the best results with an accuracy of 99.89% after tuning hyperparameters using the Grid Search method. The SVM model showed an improvement of 94.89% to 99.32%, while Random Forest recorded an accuracy of 96.87% with no significant improvement after tuning. Importance feature analysis identified PM2.5 and AQI as the dominant factors in influencing air quality, with PM2.5 having the highest importance value of 0.93. This research confirms that machine learning can be an effective tool for integrating and classifying air pollution. It is hoped that the integration of this model into a real-time air quality monitoring system can help make more responsive and precise decisions in dealing with air pollution problems
Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class
Algoritma klasifikasi merupakan algoritma yang sangat sering digunakan beriringan dengan kebutuhan manusia, namun peneliti an sebelumnya sering dijumpai kendala saat menggunakan algoritma klasifikasi. Salah satu permasalahan yang sering sekali dijumpai ialah kasus imbalanced dataset. Sehingga dalam penelitian ini diusulkan ensemble method untuk mengatasinya, salah satu algoritma ensemble method yang terkenal ialah bagging. Implementasi balanced-bagging digunakan untuk meningkatkan kemampuan dari algoritma bagging. Dalam penelitian ini melibatkan perbandingan tiga model klasifikasi berbeda dengan lima dataset yang memiliki imbalanced ratio (IR) yang berbeda, Model akan dievaluasi berdasarkan metrik akurasi (balanced accuracy), geometric mean dan area under curve (AUC). Model pertama merupakan proses klasifikasi menggunakan Decision Tree (tanpa Bagging), Model kedua merupakan proses klasifikasi menggunakan Decision Tree (dengan Bagging) dan model ketiga menggunakan Decision Tree (dengan Balanced-Bagging). Implementasi metode bagging dan balanced bagging terhadap algoritma klasifikasi Decision Tree mampu meningkatkan kinerja hasil akurasi (balanced accuracy), geometric mean, dan AUC. Secara umum model Decision Tree + Balanced Bagging menghasilkan kinerja yang terbaik pada seluruh dataset yang digunakan
Stock Price Time Series Data Forecasting Using the Light Gradient Boosting Machine (LightGBM) Model
In the world of stock investment, one of the things that commonly happens is stock price fluctuations or the ups and downs of stock prices. As a result of these fluctuations, many novice investors are afraid to play stocks. However, on the other hand, stocks are a type of investment that can be relied upon during disasters or economic turmoil, such as in 2019, namely the Covid-19 pandemic. For stock price fluctuations to be estimated by investors, it is necessary to carry out a forecasting activity. This study builds stock price forecasting using the Light Gradient Boosting Machine (LightGBM) algorithm, which has high accuracy and efficiency. To forecast stock price time series, the model used is the LightGBM ensemble. At the same time, they were optimizing the determination of hyperparameters using Grid Search Cross Validation (GSCV). This study will also compare the LGBM algorithm with other algorithms to see which model is optimal in forecasting price stock data. In this study, the test used the RMSE metric by comparing the original data (testing data) with the predicted results. The experimental results show that the LightGBM model can compete with and outperform boosting-based forecasting models like XGBoost, AdaBoost, and CatBoost. In comparing forecasting models, the same dataset is used so that the results are accurate, and the comparisons are equivalent. In future research, paying attention to the data during pre-processing is necessary because it has many outliers. In addition, it is necessary to include exogenous variables and external variables, which are determined to involve many parties
Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class
Berdasarkan data Riset Kesehatan Dasar (Riskesdas) tahun 2018, angka kejadian penyakit jantung dan pembuluh darah semakin meningkat dari tahun ke tahun. Setidaknya, 15 dari 1000 orang, atau sekitar 2.784.064 individu di Indonesia menderita penyakit jantung. Data mining merupakan bidang yang dapat menjadi solusi untuk digunakan sebagai alat deteksi dini penyakit jantung. Pada penelitian yang dilakukan sebelumnya mayoritas menggunakan single classifier, hal ini akan menimbulkan sebuah permasalahan baru ketika dalam dataset penyakit terdapat ketidakseimbangan kelas. Keberadaan ketidakseimbangan tersebut dapat menyebabkan kinerja single classifier menjadi tidak maksimal. Oleh karena itu pada penelitian ini akan digunakan metode ensemble atau meta learning. Berdasarkan pengujian yang dilakukan menunjukkan bahwa algoritme stacking mampu menghasilkan kinerja dari sisi akurasi TPR, TNR, G-Mean dan AUC yang lebih baik dibandingkan single classifier lainnya. Dengan adanya peningkatan nilai tersebut diharapkan penelitian ini mampu menjadi referensi untuk pengembagan berbagai sistem yang mendukung dan memaksimalkan tingkat keberhasilan proses deteksi dini penyakit jantung menggunakan data mining
COMPARISON OF ENSEMBLE METHODS FOR DECISION TREE MODELS IN CLASSIFYING E. COLI BACTERIA
Certain strains of Escherichia coli (E. coli) can cause serious illness, so identifying dangerous strains with high accuracy is a priority in supporting public health and food safety. However, traditional machine learning methods, such as Decision Trees, are often not robust enough to handle the complexity of biological data. This research presents a solution by systematically evaluating seven ensemble methods, namely Adaboost, Gradient Boosting, XGBoost, LightGBM, Random Forest, Bagging, and Stacking, using a dataset that includes 336 E. coli samples with eight biological features. These models are evaluated based on accuracy, precision, recall, and F1 score, with parameter optimization to obtain the best results. The results show that XGBoost is superior with accuracy, recall, and F1 score of 88% and precision of 87%, outperforming other methods. This research has the advantage of a comprehensive approach in comparing various ensemble methods simultaneously, accompanied by the application of confusion matrix-based evaluation to ensure the accuracy of the results. Additionally, the ensemble approach proved to be more effective in handling complex data patterns and reducing bias in bacterial strain classification. These findings provide a significant contribution, namely a practical framework for improving laboratory diagnostics and public health surveillance, with machine learning-based solutions that are faster, more reliable, and applicable for both industrial and clinical environments. This research expands understanding of the potential of ensemble methods in microbiological data classification and provides new directions for modern diagnostic technology