169 research outputs found

    Improvement of the Accuracy of Prediction Using Unsupervised Discretization Method: Educational Data Set Case Study

    Get PDF
    This paper presents a comparison of the efficacy of unsupervised and supervised discretization methods for educational data from blended learning environment. Naïve Bayes classifier was trained for each discretized data set and comparative analysis of prediction models was conducted. The research goal was to transform numeric features into maximum independent discrete values with minimum loss of information and reduction of classification error. Proposed unsupervised discretization method was based on the histogram distribution and implementation of oversampling technique. The main contribution of this research is improvement of accuracy prediction using the unsupervised discretization method which reduces the effect of ignoring class feature for educational data set

    Multiclass Prediction Model for Student Grade Prediction Using Machine Learning

    Get PDF
    This work was supported in part by the Ministry of Higher Education through the Fundamental Research Scheme under Grant FRGS/1/2018/ICT04/UTM/01/1, in part by the Speci~c Research Project (SPEV) at the Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic, under Grant 2102-2021, in part by the Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, and in part by the Malaysia Research University Network (MRUN) under Grant Vot 4L876.Today, predictive analytics applications became an urgent desire in higher educational institutions. Predictive analytics used advanced analytics that encompasses machine learning implementation to derive high-quality performance and meaningful information for all education levels. Mostly know that student grade is one of the key performance indicators that can help educators monitor their academic performance. During the past decade, researchers have proposed many variants of machine learning techniques in education domains. However, there are severe challenges in handling imbalanced datasets for enhancing the performance of predicting student grades. Therefore, this paper presents a comprehensive analysis of machine learning techniques to predict the nal student grades in the rst semester courses by improving the performance of predictive accuracy. Two modules will be highlighted in this paper. First, we compare the accuracy performance of six well-known machine learning techniques namely Decision Tree (J48), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbor (kNN), Logistic Regression (LR) and Random Forest (RF) using 1282 real student's course grade dataset. Second, we proposed a multiclass prediction model to reduce the over tting and misclassi cation results caused by imbalanced multi-classi cation based on oversampling Synthetic Minority Oversampling Technique (SMOTE) with two features selection methods. The obtained results showthat the proposed model integrates with RF give signi cant improvement with the highest f-measure of 99.5%. This proposed model indicates the comparable and promising results that can enhance the prediction performance model for imbalanced multi-classi cation for student grade prediction.Science and Technology Development Fund (STDF)Ministry of Higher Education & Scientific Research (MHESR) FRGS/1/2018/ICT04/UTM/01/1Specific Research Project (SPEV) at the Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic 2102-2021Universiti Teknologi Malaysia (UTM) Vot-20H04Malaysia Research University Network (MRUN) 4L87

    Application of classification models to predict students’ academic performance using classifiers ensemble and synthetic minority over sampling techniques

    Get PDF
    The demand for data-driven decision making has resulted in the application of data mining in the educational sector and other disciplines. The needs for improving the performance of data mining models have been identified as an interesting area of research globally. Higher educational institutions keep a large amount of students’ data, but these data are rarely used effectively in decision and or policy-making processes. This research is an attempt to enhance the performance of data mining models to predict students’ academic performance using stacking classifiers ensemble and synthetic minority over-sampling techniques. The three (3) classifiers models J48, IBK and SMO were trained and tested on 206 students’ data set using previous academic performance records of Federal University Dutse, Nigeria. WEKA 3.9.1 data mining tool was used in predicting the final year student’s classes of degree at an undergraduate level, while Unified Tertiary Matriculation Examination, Senior Secondary Certificate Examinations and first-year Cumulative Grade Point Average of students served as inputs to the model. The result obtained showed that on training dataset after class balancing, stacking classifiers ensemble model out- performing the other three (3) classifiers models in both performance accuracy (96.7949%) and RSME (0.1098), suggesting that stacking classifiers ensemble is the best model in context of this research.Keywords: Educational Data Mining, J48. SMO. IBK, Stacking Classifiers Ensembl

    IMPROVING STUDENTS PERFORMANCE PREDICTION USING MACHINE LEARNING AND SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE

    Get PDF
    Classification under supervision is the most common job that performed by machine learning. However, most Educators were worried about the rising evidence of student academic failures in university education. So, this study presents a supervised classification strategy of machine learning algorithm using an actual dataset contains 44 students, fourteen attributes for three previous academic years. We have proposed features that show the relationship among three main subjects which are, calculus, mathematical analysis, and control system in the education course. The objective of this study is to identify the student’s failure in the control system subject and to enhance his performance by Multilayer Perceptron (MLP) algorithm. The dataset is unbalanced, which causes overfitting of the results. Synthetic Minority Oversampling Technique has applied to a dataset for obtaining balance dataset using Weka tool. Several standard metrics used to evaluate the classifier results. Therefore, the suitable results occurred after applying SMOTE with an accuracy of 76.9%

    Predicting engineering student success using machine learning

    Get PDF
    Abstract: Recent years have seen an increase in the number of students from diverse backgrounds enrolling into South African universities, presenting many challenges. Some students struggle with their academic choices, and universities struggle to understand and address the individual needs of such a diverse student base. Fortunately, vast amounts of student information have been collected and stored, giving an opportunity for researchers in educational data mining to derive some useful insights from this data to help both the universities and students. This research aims to identify factors that contribute to the success and or failure of a student, then predict the future performance of the student at enrolment. By using data pre-processing techniques, the experiments identify the most significant success factors from the data at enrolment time. The most significant factors can then be used to identify students who may need extra support, and the nature of those factors can help determine the manner of support needed. This study implemented and evaluated the effectiveness of the most commonly used and new machine learning algorithms in predicting student performance on a sample of 1366 engineering students. The results show various degrees of success in predicting student performance, and it is hoped that these findings will guide the selection of machine learning algorithms for future studies

    Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network

    Get PDF
    Dalam data nyata, ada banyak situasi di mana jumlah instance di satu class jauh lebih sedikit daripada jumlah instance di class lain. Keadaan ini disebut sebagai masalah dataset tidak seimbang (imbalance class). Imbasnya kinerja klasifikasi biasanya menurun di beberapa aplikasi data mining. Pada penelitian ini, diidentifkasi bahwa dataset performansi rating iklan TV yang digunakan memiliki permasalahan imbalance class yang sangat besar dimana instance yang memiliki nilai rating tinggi, jauh lebih sedikit dibandingkan instance yang memiliki nilai rating kecil dan menengah. Sehingga diperlukan metode over-sampling untuk mengatasi permasalahan imbalance class tersebut. Metode yang dapat digunakan adalah Synthetic Minority Over-sampling Technique (SMOTE). Untuk memvalidasi keefektifan model yang diusulkan, dilakukan dua skenario eksperimental yaitu: pertama algoritma ANN langsung digunakan untuk pemodelan tanpa mempertimbangkan ketidakseimbangan kelas, dan kedua dilakukan over-sampling SMOTE untuk meningkatkan jumlah dataset agar mencapai dataset yang seimbang. Hasil eksperimen menunjukkan bahwa performansi ANN+SMOTE mencapai akurasi sebesar 87.06% dibandingkan ANN yang hanya sebesar 86.35%. Penerapan Teknik SMOTE terbukti dapat mengatasi masalah ketidakseimbangan data dan mendapatkan hasil klasifikasi yang lebih baik

    IMPLEMENTASI SMOTE UNTUK MENGATASI IMBALANCED DATA PADA SENTIMEN ANALISIS SENTIMEN HOTEL DI NUSA TENGGARA BARAT DENGAN MENGGUNAKAN ALGORITMA SVM

    Get PDF
    The development of a digital platform that connects all tourism stakeholders in Indonesia has been widely applied, especially for lodging services. Dozens of inns with various facilities offered. The development of the world of machine learning has many researchers regarding sentiment analysis that can be associated with the phenomenon of the increasing tourism industry. Many tourists tend to be confused about finding a hotel or inn that suits what they want. One of them is by reading from the reviews of previous visitors. However, sometimes the many reviews create confusion for tourists. Sentiment analysis is an evaluation to determine a person's sentiments, emotions, expressions, and attitudes and usually uses a dataset in machine learning. This research is an analysis of the Support Vector Machine (SVM) algorithm: Sequential Minimal Optimization (SMO) with Synthetic Minority Over-Sampling Technique (SMOTE) for data classification given Sentiment Analysis dataset from reviews of hotel visitors in West Nusa Tenggara from the traveloka site and the collection process it uses scrapy. By applying the imbalance dataset handling method, it is hoped that a classification model with the SVM algorithm will be more accurate and able to handle biases in the classification results. The results of this study using the SVM algorithm without applying the Synthetic Minority Over-Sampling Technique (SMOTE) get an accuracy of 87.62% and the results using the SVM SMOTE algorithm get an accuracy of 87.99%Keywords: bias, imbalance dataset, SVM, SMOTE

    Teknik SMOTE Sebagai Solusi Imbalance Class dalam Model Deteksi Intrusi DDoS dengan Metode PCA-Random Forest

    Get PDF
    ABSTRAKKeamanan sistem informasi adalah faktor yang harus diperhatikan. Keamanan sistem informasi mampu mendeteksi serangan yang terjadi pada sistem informasi. Salah satunya adalah serangan DDoS. Hal ini disebabkan DDoS dapat menimbulkan ancaman dalam jumlah besar yang dapat menganggu sistem. Serangan DDoS di dunia meningkat 6% setiap tahunnya. Untuk mengatasi hal tersebut, dilakukan penelitian dengan pendekatan machine learning. Dataset yang digunakan adalah CICDDos 2017 dan CICDDoS 2019 dari University of New Brunswick. Untuk menghasilkan data yang baik, dilakukan SMOTE untuk mengatasi imbalance class, dan feature selection menggunakan PCA sehingga menghasilkan 15 fitur pilihan. Kemudian dilakukan pemodelan menggunakan Random Forest Classifier. Hasil penelitian ini adalah nilai akurasi sebesar 99.94%, presisi sebesar 99.90%, recall sebesar 99.97%, dan f1-score sebesar 99.94%. Dari hasil tersebut, dapat disimpulkan teknik PCA-Random Forest dapat mendeteksi serangan DDoS dengan baik.Kata kunci: DDoS, SMOTE, PCA-Random ForestABSTRACTInformation system security is a factor that must be considered. Information system security is able to detect attacks that occur on information systems. One of them is a DDoS attack. This is because DDoS can cause a large number of threats that can disrupt the system. DDoS attacks in the world are increasing 6% every year. To overcome this, we conducted research using a machine learning approach. The dataset used is CICDDoS 2017 and CICDDoS 2019 from the University of New Brunswick. To produce good data, SMOTE is performed to overcome class imbalance, and feature selection uses PCA to produce 15 selected features. Then modeling is done using the Random Forest Classifier. The results of this study are 99.94% accuracy, 99.90% precision, 99.97% recall, and 99.94% f1-score. From these results, it can be concluded that the PCA-Random Forest technique can detect DDoS attacks properly.Keywords: DDoS, SMOTE, PCA-Random Fores

    SYSTEMATIC LITERATURE REVIEW PREDIKSI KINERJA SISWA: TREN PENELITIAN, METODE, DATASET, DAN ATRIBUT

    Get PDF
    Prediksi kinerja siswa banyak diteliti oleh peneliti-peneliti dunia khususnya di bidang pendidikandan data mining. Prediksi kinerja siswa pada beberapa penelitian pasti tidak lepas dari metode datamining khususnya metode klasifikasi. Maraknya penelitian tentang kinerja siswa karena dataset yangdigunakan yaitu dataset pendidikan yang mana pasti memiliki kapasitas yang besar dan terdapat beberapayang belum terolah dengan baik. Dari penelitian prediksi kinerja siswa, terdapat beberapa permasalahanyaitu belum teridentifikasinya secara baik tren penelitian, metode, dataset, dan atribut dalam prediksikelulusan siswa. Tren penelitian tentang prediksi kinerja siswa yang ada masih belum cukup untukmengetahui tingkat popularitas dari penelitian prediksi kinerja siswa dan masih lemahnya identifikasimengenai metode, dataset dan atribut yang sering digunakan dan terbaik. Kurangnya analisis tersebutmenjadikan systematics literature review ini jawaban untuk mengetahui tren penelitian prediksi kinerjasiswa serta metode, dataset, dan atribut apa saja yang digunakan dalam prediksi kinerja siswa. Darisystematics literature review ini didapatkan hasil analisis berupa tren penelitian yaitu tren tahun penelitiandan tren negara penelitian serta hasil analisis berupa metode, dataset, dan atribut yang sering digunakandan terbaik dalam prediksi kinerja siswa.Kata Kunci: Systematic Literature Review, Kinerja Siswa, Data Mining, Dataset Pendidikan, TrenPenelitia
    corecore