8 research outputs found

    IMPLEMENTATION OF GAIN RATIO AND K-NEAREST NEIGHBOR FOR CLASSIFICATION OF STUDENT PERFORMANCE

    Get PDF
    Predicting student performance is very useful in analyzing weak students and providing support to students who face difficulties. However, the work done by educators has not been effective enough in identifying factors that affect student performance. The main predictor factor is an informative student academic score, but that alone is not good enough in predicting student performance. Educators utilize Educational Data Mining (EDM) to predict student performance. KK-Nearest Neighbor is often used in classifying student performance because of its simplicity, but the K-Nearest Neighbor has a weakness in terms of the high dimensional features. To overcome these weaknesses, a Gain Ratio is used to reduce the high dimension of features. The experiment has been carried out 10 times with the value of k is 1 to 10 using the student performance dataset. The results of these experiments are obtained an average accuracy of 74.068 with the K-Nearest Neighbor and obtained an average accuracy of 75.105 with the Gain Ratio and K-Nearest Neighbor. The experimental results show that Gain Ratio is able to reduce the high dimensions of features that are a weakness of K-Nearest Neighbor, so the implementation of Gain Ratio and K-Nearest Neighbor can increase the accuracy of the classification of student performance compared to using the K-Nearest Neighbor alone

    IMPLEMENTATION OF K-NEAREST NEIGHBOR AND GINI INDEX METHOD IN CLASSIFICATION OF STUDENT PERFORMANCE

    Get PDF
    Predicting student academic performance is one of the important applications in data mining in education. However, existing work is not enough to identify which factors will affect student performance. Information on academic values ​​or progress on student learning is not enough to be a factor in predicting student performance and helps students and educators to make improvements in learning and teaching. K-Nearest Neighbor is a simple method for classifying student performance, but K-Nearest Neighbor has problems in terms of high feature dimensions. To solve this problem, we need a method of selecting the Gini Index feature in reducing the high feature dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with values ​​of k (1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the highest average accuracy of 74.068 while the K-Nearest Neighbor and Gini Index methods showed the highest average accuracy of 76.516. From the results of these tests it can be concluded that the Gini Index is able to overcome the problem of high feature dimensions in K-Nearest Neighbor, so the application of the K-Nearest Neighbor and Gini Index can improve the accuracy of student performance classification better than using the K-Nearest Neighbor method

    PENERAPAN METODE K-NEAREST NEIGHBOR DAN INFORMATION GAIN PADA KLASIFIKASI KINERJA SISWA

    Get PDF
    Education is a very important problem in the development of a country. One way to reach the level of quality of education is to predict student academic performance. The method used is still using an ineffective way because evaluation is based solely on the educator's assessment of information on the progress of student learning. Information on the progress of student learning is not enough to form indicators in evaluating student performance and helping students and educators to make improvements in learning and teaching. K-Nearest Neighbor is an effective method for classifying student performance, but K-Nearest Neighbor has problems in terms of large vector dimensions. This study aims to predict the academic performance of students using the K-Nearest Neighbor algorithm with the Information Gain feature selection method to reduce vector dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with k values ​​(1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the largest average accuracy of 74.068 while the K-Nearest Neighbor and Information Gain methods obtained the highest average accuracy of 76.553. From the results of these tests it can be concluded that Information Gain can reduce vector dimensions, so that the application of K-Nearest Neighbor and Information Gain can improve the accuracy of the classification of student performance better than using the K-Nearest Neighbor method

    KOMPARASI METODE DECISION TREE, NAIVE BAYES DAN K-NEAREST NEIGHBOR PADA KLASIFIKASI KINERJA SISWA

    Get PDF
    In education, student performance is an important part. To achieve good and quality student performance requires analysis or evaluation offactors that influence student performance. The method still using an evaluation based only on the educator's assessment of information on theprogress of student learning. This method is not effective because information such as student learning progress is not enough to form indicators in evaluating student performance and helping students and educators to make improvements in learning and teaching. Previous studies have been conducted but it is not yet known which method is best in classifying student performance. In this study, the Decision Tree, Naive Bayes and K-Nearest Neighbor methods were compared using student performance datasets. By using the Decision Tree method, the accuracy is 78.85, using the Naive Bayes method, the accuracy is 77.69 and by using the K-Nearest Neighbor method, the accuracy is79.31. After comparison the results show, by using the K-Nearest Neighbor method, the highest accuracy is obtained. It concluded that the KNearest Neighbor method had better performance than the Decision Tree and Naive Bayes method

    A similarity-based Bayesian mixture-of-experts model

    Full text link
    We present a new nonparametric mixture-of-experts model for multivariate regression problems, inspired by the probabilistic kk-nearest neighbors algorithm. Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point, yielding predictive distributions represented by Gaussian mixtures. Posterior inference is performed on the parameters of the mixture components as well as the distance metric using a mean-field variational Bayes algorithm accompanied with a stochastic gradient-based optimization procedure. The proposed method is especially advantageous in settings where inputs are of relatively high dimension in comparison to the data size, where input--output relationships are complex, and where predictive distributions may be skewed or multimodal. Computational studies on two synthetic datasets and one dataset comprising dose statistics of radiation therapy treatment plans show that our mixture-of-experts method performs similarly or better than a conditional Dirichlet process mixture model both in terms of validation metrics and visual inspection

    Klasifikasi Kebutuhan Jumlah Produk Makanan Customer Menggunakan K-Means Clustering dengan Optimasi Pusat Awal Cluster Algoritma Genetika

    Get PDF
    PT. Harum Bakery adalah salah satu perusahaan di Yogyakarta yang bergerak pada bidang produksi dan distribusi produk makanan roti. Setiap konsumen memiliki jumlah kebutuhan roti yang tidak teratur, sedangkan roti hanya dapat bertahan dalam waktu dua hari. Roti yang sudah berusia lebih dari dua hari akan diganti dengan yang baru oleh distributor, sehingga dapat menimbulkan kerugian bagi perusahaan. Penelitian ini mencoba untuk melakukan data mining dengan tujuan mengklasifikasikan jumlah produk makanan kepada customer menggunakan k-means clustering dengan optimasi pusat awal cluster algoritma genetika. Pada penelitian ini digunakan 210 data dari penjualan produk selama tiga minggu. Data tersebut akan diproses dengan menerapkan metode data mining melalui tahap preprocessing kemudian tahap klasifikasi. Preprocessing yang dilakukan antara lain, data transformation dan k-means clustering. Hasil dari clustering yang membutuhkan aturan tertentu lebih efektif dengan optimasi karena dari 210 data terdapat 200 data yang layak masuk tahap klasifikasi. Hasil dari pengujian mendapatkan akurasi terbaik sebesar 58.50 % dan crossvalidation untuk lima fold berhasil mendapatkan rata-rata akurasi sebesar 50.58% lebih besar 2.51 % dari KNN tanpa preprocessing.AbstractPT. Harum Bakery is one of the companies in Yogyakarta engaged in the production and distribution of bakery food products. Every consumer has an irregular amount of bread needs while bread can only last for two days. Bread that is more than two days old will be replaced by a new one by the distributor which causes losses for the company. This study tries to apply data mining to classify the number of customer needs for food products using k-means clustering with optimization initial cluster center genetic algorithm. In this study used 210 data from product sales for three weeks. Data will be processed by applying data mining method with preprocessing before going through classification. Preprocessing includes data transformation and k-means clustering. The results of clustering that require certain rules are more effective with optimization because 210 data have 200 data that are worth entering the classification stage. The results of the test get the best accuracy of 58.50% and crossvalidation for five fold managed to get an average accuracy of 50.58% greater than 2.51% of KNN without preprocessing

    Efficient model selection for probabilistic K nearest neighbour classification

    Get PDF
    Probabilistic K-nearest neighbour (PKNN) classification has been introduced to improve the performance of the original K-nearest neighbour (KNN) classification algorithm by explicitly modelling uncertainty in the classification of each feature vector. However, an issue common to both KNN and PKNN is to select the optimal number of neighbours, K. The contribution of this paper is to incorporate the uncertainty in K into the decision making, and consequently to provide improved classification with Bayesian model averaging. Indeed the problem of assessing the uncertainty in K can be viewed as one of statistical model selection which is one of the most important technical issues in the statistics and machine learning domain. In this paper, we develop a new functional approximation algorithm to reconstruct the density of the model (order) without relying on time consuming Monte Carlo simulations. In addition, the algorithms avoid cross validation by adopting Bayesian framework. The performance of the proposed approaches is evaluated on several real experimental datasets.Science Foundation IrelandMKE (The Ministry of Knowledge Economy), Kore
    corecore