PROXIES Jurnal Informatika
Not a member yet
86 research outputs found
Sort by
PREDICTING FLIGHT DELAY USING RANDOM FOREST ALGORITHM, XGBOOST ALGORITHM, AND STACKING ENSEMBLE METHOD
Flight delays are problematic for both passengers and airlines. With the increasing amount of flight traffic volume, time punctuality is important since it significantly influences passengers’ satisfaction and airline companies' financial performance. Many studies have been conducted to predict these delays by using machine learning algorithms. In some research, it was found that combining more than one machine learning algorithm can improve the prediction results. Therefore, in this research, a comparison of machine learning ensemble methods like bagging, boosting, and stacking to predict flight delays is compared. The objective of this research is to find the best-performing ensemble method for flight delay prediction. A dataset from Kaggle named ‘Flight Status Prediction’ is used as the dataset for this research. Then, the dataset is cleaned and modified using the preprocessing steps. After that, the dataset is fitted to each ensemble model using the Random Forest algorithm as the bagging method, the Extreme Gradient Boosting (XGBoost) algorithm as the boosting method, and combining both algorithms using the stacking method with Random Forest as the first learner, and the results are evaluated based on the accuracy, recall, and precision values. The results are gotten from two different dimensional reduction methods, which are feature selection and principal component analysis (PCA). The results obtained from this study are that the XGBoost model performs best on predicting flight delays with a mean average accuracy of above 95% in both dimensionality reduction methods, while the Stacking Ensemble method performs the worst with a mean accuracy of less than 92% in both dimensionality reduction methods
OPTIMIZING CNNS FOR FACE RECOGNITION: COMPARING WELL-KNOWN ARCHITECTURES WITH CUSTOM MODELS
This project aims to find which CNN architecture is the best. From the research results, well-known CNN architectures give better accuracy than the custom made one. Face recognition is useful for distinguishing someone's facial data that is already in the database. With this, the system can save someone’s facial data so that it can be used later for other purposes. In this research’s photo data will be used and there are three types of datasets, 20% of the whole dataset are used as testing. Then, one type of photo is removed from training and used in testing. Thirdly, a new 100 photos, facing forward while using glasses, is used for testing. This will test the system to show the possible accuracy rates as the outcome. CNN Algorithm shows great results in terms of researches on face recognition. Well-known CNN algorithms, VGG16 and AlexNet, tend to give a high accuracy result according to many studies. Thus, this research uses two well-known CNNs as its architecture
COMPARISON BUS PASSENGER COUNTING AND GENDER DETECTION USING YOLOV8, FASTER R-CNN, AND MASK R-CNN ALGORITHM
Bus passengers' data are crucial for the bus agents, especially in Indonesia. With this data, bus agents could identify the traffic for each route of the bus. To handle this problem, many researchers have made a system to count and detect the public transportation passengers with different algorithms. Many researchers defined that You Look Only Once (YOLO) has best performance to overcome the object detection problem that has similarity in this research. The Convolutional Neural Network algorithm is also not inferior in implementing object detection either. In this research, it will investigate these three algorithms, You Only Look Once version 8 (YOLOv8), Faster Region Convolutional Neural Network (Faster R-CNN), and Mask Region Convolutional Neural Network (Mask R-CNN), in counting bus passengers and detecting the bus passenger’s gender. To find the best performance of these two algorithms, they will use a dataset that contains 408 photos of bus passengers. This research aims to analyze the result of the bus passengers data that could reduce the misalignment and determine the best algorithm to use in this case
COMPARISON K-NN REGRESSION AND SVR MSE IN 5 MOST ACTIVE STOCK BASED ON HISTORICAL DATA
This project aims to address the challenge of accurate stock price prediction by comparing the error prediction of two popular machine learning algorithms, K-Nearest Neighbors Regressor (KNN-R) and Support Vector Regressor (SVR) based on historical stock data. Based on previous research about stock prediction, the study evaluates the error in predicting future stock prices based on features such as historical price and volume. The SVR is evaluated for its ability to fit a regression model that minimizes prediction errors. In terms of accuracy, KNN Regression is hypothesized to outperform SVR. The models are trained and rested across multiple stock datasets, and results indicate that SVR consistently achieves superior predictive performance compared to KNN Regression. These findings highlight the importance of selecting the right algorithm for stock price predictio
SOCIAL NETWORK ANALYSIS OF HUAWEI COMMUNITY PLATFORM: IDENTIFYING KEY MEMBERS AND COMMUNITY STRUCTURES THROUGH CENTRALITY MEASURES, COMMUNITY DETECTION, AND CLUSTERING ALGORITHMS
This research contains the application of the SNA (Social Network Analysis) algorithm to find out the most influential people or communities on the social media platforms that are Facebook, Twitter and Instagram Huawei pages. Social network data is taken from the Kaggle site entitled "Huawei Social Network Data", this data will be used for social media analysis. Currently, many of the most effective ways to do marketing are through 3 or more social media platforms, especially as large companies like Huawei definitely do this because their company coverage reaches all over the world. However, marketing online cannot be done haphazardly, because it will waste time and cost a lot of money. One of the factors that determines whether marketing is effective or not is the target market. Considering the importance of this, this research aims to find a good target market. One way to find a good target market is to use SNA. This research will use SNA algorithms such as , Centrality Measures, Community Detection, Clustering and other SNA algorithms. In the data there are 1000 columns and 1000 rows which are nodes and edges. The nodes labeled here are people's names and the number 1 is the number of edges. By analyzing this research, we can find out people or communities who have high potential to buy or even subscribe to Huawei products
CLASSIFICATION OF MATERNAL HEALTH RISKS USING BOOSTING TECHNIQUE
Maternal health is an important concern as it directly impacts the ongoing survival and well-being of future generations. In many developing nations, maternal mortality has become a serious problem despite the advances in medical science. Recognizing potential risks in pregnancies is essential for the well-being of the mother and the newborn. In various clinical applications, including disease diagnosis, treatment planning, and patient monitoring, AI models are capable of showing promising results. Depending on the chance of complications during pregnancy, it can be categorized into three risk levels: low, and moderate. The risk is classified based on age, systolic and diastolic blood pressure, blood sugar, body temperature, and heart rate. This paper aims to apply several boosting techniques: Boosted Random Forest, XGboost, and Catboost to classify the maternal health risk. By classifying the maternal risk, it should help minimize the occurrence of maternal death which will lead to the continuation of humanity
PREDICTING EMPLOYEE ATTRITION USING TABNET
High employee turnover may threaten stability and productive working environment in a company. Not to mention, it is also more costly than retaining existing employees. The key to solve this problem is to predict employee attrition. Most of the previous researches utilized tree-based model such as Random Forest or a simple deep learning model such as Multi-layer Perceptron. This project will include training a TabNet model for the prediction of employee attrition and comparison of its performance concerning metrics such as accuracy, precision, recall, and F1 score against both a Multi-Layer Perceptron model and a Random Forest model. This study anticipated that the TabNet model would produce results comparable to other models; however, TabNet demonstrated lower performance than both the Random Forest and Multi-Layer Perceptron models. Out of all the models, the Random Forest model performs the best in all key metrics, followed closely by the Multi-Layer Perceptron model. The results indicate that the tree- based algorithms seem to be producing better outputs for predicting employee attrition in structured datasets. The findings of this research offer valuable insights for businesses aiming to improve employee retention strategies
PERFORMANCE OF SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE ON SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR FOR SENTIMENT ANALYSIS OF METAVERSE IN INDONESIA
The metaverse is one of the most discussed things on social media, Twitter in Indonesia. This view can be both positive and negative in Indonesian society, hence the need for sentiment analysis. However, creating a sentiment classification model with unbalanced data will reduce performance. For this reason, Synthetic Minority Oversampling is needed in Support Vector Machine and K-Nearest Neighbor algorithms. The results of Synthetic Minority Oversampling can improve the accuracy of the Support Vector Machine and K-Nearest Neighbor algorithms
USABILITY ANALYSIS OF STABLE DIFFUSION-BASED GENERATIVE MODEL FOR ENRICHING BATIK BAKARAN PATTERN SYNTHESIS
The rapid development of technology today helps us in various fields of work. One of the fields that can utilize technology in helping their work is batik. Utilizing Deep Learning to manage data in the form of batik pattern images and typical bakaran batik patterns using the Generative Model method, namely Stable Diffusion which aims to produce better and more detailed batik pattern images by maintaining the original pattern of batik patterns and typical bakaran batik patterns. This research only uses datasets in the form of batik pattern images and typical bakaran batik patterns. The image data is processed augmentation first by performing the inverse on the image, resizing the image to 512x512, then randomly rotating the image, performing a random horizontal flip on the image, and performing the inverse again on the image. Pre-Training on image data to find the right parameters and conditions used in the training process. The result of this research is that the Stable Diffusion model version 1.4 and version 2.1 show good performance in processing and creating batik pattern images and batik patterns typical of Bakaran. In this study, the score calculation process for Stable Diffusion version 1.4 and version 2.1 was carried out using Inception Score and CLIP Score to calculate the images generated from the two versions. In the calculation using CLIP Score, the results obtained by version 1.4 are higher than version 2.1 for the same reason as Inception Score because the image produced by version 1.4 is more abstract. Of the two versions used is version 1.4 because the resulting image shows an abstract image that reflects a good batik pattern. Then, the version used to process batik patterns and batik patterns typical of Bakaran is Stable Diffusion version 1.4 which shows excellent performance in processing batik pattern images. The results of Stable Diffusion version 1.4 show good and abstract batik patterns in accordance with the characteristics of Bakaran batik
DIABETES PREDICTION USING SUPPORT VECTOR MACHINE AND GRADIENT DESCENT ALGORITHM
Diabetes is a health problem that can be deadly if not treated early.An early prediction of diabetes can prevent so many health problems in someones life.Using machine learning and algorithm to predict whether a person has diabetes or not can be the best solution when it comes to diabetes problems.Support Vector Machine can classify a data point into positive or negative value,it can help with predicting whether a person has diabetes or not using the diabetes factor that a person has.Using this factor Support Vector Machine can give value to the data point of a person and decided in which side of the margin it lies to,positive or negative value.The dataset itself contains 2000 patients with diabetes factor such as Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, Diabetes, and Age with Outcome as the predictor.The final results shown that Support Vector Machine is a good approach when it comes to predicting patients with diabetes.it shown a high accuracy score of 70% in accuracy,which means the Support Vector Machine model will most likely predict 7 out of 10 patients to be correct when it comes to Diabetes Disease