Jurnal Online Informatika
Not a member yet
404 research outputs found
Sort by
Improving with Hybrid Feature Selection in Software Defect Prediction
Software defect prediction (SDP) is used to identify defects in software modules that can be a challenge in software development. This research focuses on the problems that occur in Particle Swarm Optimization (PSO), such as the problem of noisy attributes, high-dimensional data, and premature convergence. So this research focuses on improving PSO performance by using feature selection methods with hybrid techniques to overcome these problems. The feature selection techniques used are Filter and Wrapper. The methods used are Chi-Square (CS), Correlation-Based Feature Selection (CFS), and Forward Selection (FS) because feature selection methods have been proven to overcome data dimensionality problems and eliminate noisy attributes. Feature selection is often used by some researchers to overcome these problems, because these methods have an important function in the process of reducing data dimensions and eliminating uncorrelated attributes that can cause noisy. Naive Bayes algorithm is used to support the process of determining the most optimal class. Performance evaluation will use AUC with an alpha value of 0.050. This hybrid feature selection technique brings significant improvement to PSO performance with a much lower AUC value of 0.00342. Comparison of the significance of AUC with other combinations shows the value of FS PSO of 0.02535, CFS FS PSO of 0.00180, and CS FS PSO of 0.01186. The method in this study contributes to improving PSO in the SDP domain by significantly increasing the AUC value. Therefore, this study highlights the potential of feature selection with hybrid techniques to improve PSO performance in SDP
Water Level Time Series Forecasting Using TCN Study Case in Surabaya
Climate change is causing water levels to rise, leading to detrimental effects like tidal flooding in coastal areas. Surabaya, the capital of East Java Province in Indonesia, is particularly vulnerable due to its low-lying location. According to the Meteorological, Climatological, and Geophysical Agency (BMKG), tidal flooding occurs annually in Surabaya as a result of rising water levels, highlighting the urgent need for water level forecasting models to mitigate these impacts. In this study, we employ the Temporal Convolutional Network (TCN) machine learning model for water level forecasting using data from a sea level station monitoring facility in Surabaya. We divided the training data into three scenarios: 3, 6, and 8 months to train TCN models for 14-day forecasts. The 8-month training scenario yielded the best results. Subsequently, we used the 8-month training data to forecast 1, 3, 7, and 14 days using TCN, Transformers, and the Recurrent Neural Network (RNN) models. TCN consistently outperformed other models, particularly excelling in 1-day forecasting with coefficient of determination () and RMSE values of 0.9950 and 0.0487, respectively
Modeling Face Detection Application Using Convolutional Neural Network and Face-API for Effective and Efficient Online Attendance Tracking
The pandemic of Covid-19 emergency has ended, but it gives us a new lifestyle every aspect of life and also in the education aspect has changed. At that moment as one of the ways to prevent pandemic infection, many governments give the policy to close the offline class and continue with online classes. The online class system encountered several problems and one of those problems was to track the students’ attendance to ensure all the students were attending the class. The teacher needed extra effort to track it because they needed to call the students one by one which is wasting time and sometimes would miss the presence of the students who attend the class. To make it effective efficient accurate and time-consuming when tracking attendance in online classes for teachers, we proposed the face detection model which combines face-api.js and CNN to detect and recognize the students’ faces to help teachers track attendance by just uploading the screenshot image of the online meeting application. We tested our model with accuracy and speed testing. With 3 images of every student’s face as training data, our model was able to recognize the face with 100% accuracy in just 41,65 seconds which is faster than calling students one by one that need almost 3 to 5 minutes if there are many students. Future research can be done by focusing research on improving the model to detect the students’ faces with different brightness, contrast, and saturation because students may not have the same place and condition when joining an online meeting class
Deep Learning Based LSTM Model for Predicting the Number of Passengers for Public Transport Bus Operators
The bus public transportation system has low reliability and ability to predict the number of passengers. The accuracy of predicting the number of passengers by public transport bus operators is still weak, which results in failure to implement solutions by operators. A prediction model with LSTM based on deep learning is proposed to predict passengers for 4 bus public transportation operators (Go Bus, New Zealand Bus, Pavlovich, and Ritchies) which are evaluated by MSLE, MAPE, and SMAPE with variations in epoch, batch size, and neurons. The dataset is a CSV performance report on Auckland Transport (AT) New Zealand metro patronage buses (01/01/2019-07/31/2023). The best prediction model was obtained from the lowest evaluation value and relatively fast time at variations of epoch 60, batch size 16, and neurons 32. The prediction results on training and testing data improved with the suitability of the model tuning. The proposed prediction model performs predictions 12 months later for 4 predictions simultaneously with predicted fluctuations occurring simultaneously. Strong negative correlation on New Zealand Bus-Pavlovich, strong positive correlation on Go Bus with Ritchies and Pavlovich. Predictions that are less closely related and dependent are New Zealand Bus against Go Bus, Pavlovich, and Ritchies. The proposed prediction modeling can be used as a basis for creating operator policies and strategies to deal with passenger fluctuations and for the development of new prediction models
A Comparison of Ryu and Pox Controllers: A Parallel Implementation
Software Defined Network (SDN) network controllers have limitations in handling large volumes of data generated by switches, which can slow down their performance. Using parallel programming methods such as threading, multiprocessing, and MPI aims to improve the performance of the controller in handling a large number of switches. By considering factors such as memory usage, CPU consumption, and execution time. The test results show that although RYU outperforms POX in terms of faster execution time and lower CPU utilization rate, POX shows its prowess by exhibiting less memory usage despite higher CPU utilization rate than RYU. The use of the parallel approach proves advantageous as both controllers exhibit enhanced efficiency levels. Ultimately, RYU's impressive speed and superior resource optimization capabilities may prove to be more strategic than POX over time. Taking into account the specific needs and prerequisites of a given system, this research provides insights in selecting the most suitable controller to handle large-scale switches with optimal efficiency
CatBoost Optimization Using Recursive Feature Elimination
CatBoost is a powerful machine learning algorithm capable of classification and regression application. There are many studies focusing on its application but are still lacking on how to enhance its performance, especially when using RFE as a feature selection. This study examines the CatBoost optimization for regression tasks by using Recursive Feature Elimination (RFE) for feature selection in combination with several regression algorithm. Furthermore, an Isolation Forest algorithm is employed at preprocessing to identify and eliminate outliers from the dataset. The experiment is conducted by comparing the CatBoost regression model's performances with and without the use of RFE feature selection. The outcomes of the experiments indicate that CatBoost with RFE, which selects features using Random Forests, performs better than the baseline model without feature selection. CatBoost-RFE outperformed the baseline with notable gains of over 48.6% in training time, 8.2% in RMSE score, and 1.3% in R2 score. Furthermore, compared to AdaBoost, Gradient Boosting, XGBoost, and artificial neural networks (ANN), it demonstrated better prediction accuracy. The CatBoost improvement has a substantial implication for predicting the exhaust temperature in a coal-fired power plant
Cassava Diseases Classification using EfficientNet Model with Imbalance Data Handling
This research highlights the urgent need for classifying cassava diseases into five classes, such as Cassava Bacterial Blight (CBB), Cassava Brown Streak Disease (CBSD), Cassava Green Mottle (CGM), and Cassava Mosaic Disease (CMD), and Healthy. The study proposes the utilization of the EfficientNet model, a lightweight deep learning architecture, for classifying cassava diseases based on leaf images. However, the datasets available for this classification task are all unbalanced, made it difficult for researchers to perform. To tackle this imbalance issue, the authors compared several imbalance data handling methods commonly used for image classification, including SMOTE (Synthetic Minority Oversampling Technique), basic augmentation, and neural style transfer, to be applied before fed into EfficientNet. Initially, EfficientNet model without addressing dataset imbalances, the F1-Score stands at 78%, with most images misclassified into the majority class. Integration with SMOTE notably boosts the F1-Score to 82%, showcasing the efficacy of oversampling methods in enhancing model performance. Conversely, employing data augmentation, both basic and deep learning-based, lowers the F1-Score to 74% and 65% respectively, yet it results in a more balanced distribution of true positives across disease classes. The findings suggest that SMOTE surpasses the other methods in handling imbalanced data
Machine Learning Monitoring Model for Fertilization and Irrigation to Support Sustainable Cassava Production: Systematic Literature Review
The manual and time-consuming nature of current agronomic technology monitoring of fertilizer and irrigation requirements, the possibility of overusing fertilizer and water, the size of cassava plantations, and the scarcity of human resources are among its drawbacks. Efforts to increase the yield of cassava plants > 40 tons per ha include monitoring fertilization approach or treatment, as well as water stress or drought using UAVs and deep learning. The novel aspect of this research is the creation of a monitoring model for the irrigation and fertilizer to support sustainable cassava production. This study emphasizes the use of Unnamed Aerial Vehicle (UAV) imagery for evaluating the irrigation and fertilization status of cassava crops. The UAV is processed by building an orthomosaic, labeling, extracting features, and Convolutional Neural Network (CNN) modeling. The outcomes are then analyzed to determine the requirements for air pressure and fertilization. Important new information on the application of UAV technology, multispectral imaging, thermal imaging, among the vegetation indices are the Soil-Adjusted Vegetation Index (SAVI), Leaf Color Index (LCI), Leaf Area Index (LAI), Normalized Difference Water Index (NDWI), Normalized Difference Red Edge Index (NDRE), and Green Normalized Difference Vegetation Index (GNDVI)
Enchancing Lung Disease Classification through K-Means Clustering, Chan-Vese Segmentation, and Canny Edge Detection on X-Ray Segmented Images
The lungs are one of the vital organs in the human body. Not only play a role in the respiratory system, the lungs are also responsible for the human circulatory system. Supporting examinations can also facilitate medical workers in determining the diagnosis. Usually a lung examination is complemented by a chest X-ray examination procedure. This examination aims to see directly and assess the severity of lung conditions. With current technological advances, image analysis can be done easily. Through digital image processing methods, information can be obtained from images that can be used for analysis as a support for diagnoses in the world of health. Image segmentation is a method in which digital images are divided into several segments or subgroups based on the characteristics of the pixels in the image. In this study, clustering with the K-Means method will be carried out on the results of segmentation of x-ray images of lung diseases, namely Covid-19, Tuberculosis, and Pneumonia. The segmentation method that will be implemented is the Chan-Vese Method and the Canny Edge Detection Method. This research shows that the results of the accuracy of applying the K-Means Clustering method to Chan-Vese and Canny Edge-Based Image Segmentation are 80%
Data Balancing Techniques Using the PCA-KMeans and ADASYN for Possible Stroke Disease Cases
Imbalanced data happens when the distribution of classes is not equal between positive and negative classes. In healthcare, the majority class typically consists of healthy patient data, while the minority class contains sick patient data. This condition can cause the minority class prediction to be wrong because the model tends to predict the majority class. In this study, we use a deep neural network algorithm with focal loss that can deal with class imbalance during training. To balance the data, we use the PCA-KMeans combination model to shrink the dataset and the ADASYN model to give the minority class more samples than it needs. In this study, the research problem is how well the two techniques can improve model performance, especially in minority case classification. The mild model is the best without data balancing, resulting in an accuracy value of 84%. The class 0 F1-score has a value of 86%, whereas the class 1 F1-score has a value of 82%. The moderate model is the best model in the case study of PCA-KMeans balancing data, resulting in an accuracy value of 89%; the class 0 F1-score is 91%; and the class 1 F1-score is 85%. The extreme model is the best model in the ADASYN data balancing case study, resulting in an accuracy value of 95%; the value in class 0 gets a F1-score of 96%, while the value in class 1 gets a F1-score of 96%. Of the three test models, the best model is obtained using ADASYN extreme data balancing with an accuracy value of 95%, the value in class 0 with a F1- score of 93%