22 research outputs found

    Enhance the Accuracy of k-Nearest Neighbor (k-NN) for Unbalanced Class Data Using Synthetic Minority Oversampling Technique (SMOTE) and Gain Ratio (GR)

    Get PDF
    k-Nearest Neighbor (k-NN) has very good accuracy results on data with almost the same class distribution, but on the contrary for information whose class distribution is not the same, the accuracy of k-NN will generally be lower. In addition, k-NN does not separate information for each class, implying that each class has an equal influence in determining the new information class, so it is important to choose a class that generally applies to information before characterizing the class assignments process. To overcome this problem, we will propose a structure that uses the Synthetic Minority Oversampling Technique (SMOTE) strategy to address class distribution problems and Gain Ratio (GR) to perform attribute selection to generate a new dataset with a reasonable class spread and significant class information attributes. E-Coli and Glass Identification were among the datasets used in this review. For objective results, the 10-fold-cross validation method will be used as an evaluation method with k values 1 to 10. The results of the research prove that SMOTE and GR can increase the accuracy of the k-NN method, where the highest increase occurred in the Glass Identification dataset by a difference increase of 18.5%. The lowest increase in accuracy occurred in the E-Coli dataset with an increase of 11.4%. The overall proposed method has given better performance, although the value of precision, recall, and F1-Score is not better than original k-NN when used in dataset E-Coli. To all datasets, an improvement from precision is 41.0%, recall is 43.4% and F1-Score is 41.5%

    Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data

    Full text link
    The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e.g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy

    Peningkatan Akurasi K-Nearest Neighbor Pada Data Index Standar Pencemaran Udara Kota Pekanbaru

    Get PDF
    kNN adalah salah satu metode yang popular karena mudah dieksploitasi, generalisasi yang biak, mudah dimengerti, kemampuan beradaptasi ke ruang fitur yang rumit, intuitif, atraktif, efektif, flexibility, mudah diterapkan, sederhana dan memiliki hasil akurasi yang cukup baik. Namun kNN memiliki beberapa kelemahan, diantaranya memberikan bobot yang sama pada setiap attribut sehingga attribut yang tidak relevant juga memberikan dampak yang sama dengan attribut yang relevant terhadap kemiripan antar data. Masalah lain dari kNN adalah pemilihan tetangga terdekat dengan system suara terbanyak, dimana system ini mengabaikan kemiripan setiap tetangga terdekat dan kemungkinan munculnya mayoritas ganda serta kemungkinan terpilihnya outlier sebagai tetangga terdekat. Masalah-masalah tersebut tentu saja dapat menimbulkan kesalahan klasifikasi yang mengakibatkan rendahnya akurasi. Pada penelitian kali ini akan dilakukan peningkatan akurasi dari kNN tersebut dalam melakukan klasifikasi terhadap data Index Standar Pencemaran Udara di Pekanbaru dengan menggunakan pembobotan attribut (Attibute Weighting) dan local mean. Adapun hasil dari penelitian ini didapati bahwa metode yang diusulkan mampu untuk meningkatkan akurasi sebesar 2.42% dengan rata-rata tingkat akurasi sebesar 97.09%

    Perbandingan Kinerja k-Nearest Neighbor dan Local Mean Distance k-Nearest Neighbor Pada Data Citra Covid-19

    Get PDF
    Corona Virus Disease 2019 (covid-19) merupakan pandemi dunia yang menimbulkan berbagai kerugian di dunia. Diagnosis covid-19 yang valid memerlukan waktu yang cukup lama dan hasil ini tidak sepenuhnya akurat. Salah satu cara untuk meningkatkan hasil akurasi adalah dengan menggunakan image classification. k-Nearest Neighbor (kNN) adalah salah satu Teknik klasifikasi yang paling banyak digunakan untuk melakukan pekerjaan tersebut, hanya saja kNN masih memiliki kelemahan. Untuk mengatasi kelemahan pada kNN, maka dilakukan modifikasi dengan menambahkan local mean dan distance weight, di mana varian kNN ini dikenal dengan nama Local Mean Distance Weight k-Nearest Neighbor (LMDWkNN). Oleh sebab itu, penelitian kali mencoba membandingkan kinerja kedua algoritma ini untuk melakukan image classification pada citra covid-19. Kinerja diukur berdasarkan nilai akurasi, precision, dan recall, di mana dari hasil pengujian terbukti bahwa kinerja LMDWkNN lebih baik dari pada kinerja kNN. Algoritma LMDWkNN mengalami peningkatan rata-rata sebesar 3.5% untuk nilai akurasi, 2.89% untuk precision, dan 3.54% untuk recall. Meskipun begitu kNN tetap mampu menghasilkan kinerja yang sama baiknya yang mana kinerja kNN akan sangat bergantung dari nilai k yang digunaka

    Machine Learning and Job Posting Classification: A Comparative Study

    Get PDF
    In this paper, we investigated multiple machine learning classifiers which are, Multinomial Naive Bayes, Support Vector Machine, Decision Tree, K Nearest Neighbors, and Random Forest in a text classification problem. The data we used contains real and fake job posts. We cleaned and pre-processed our data, then we applied TF-IDF for feature extraction. After we implemented the classifiers, we trained and evaluated them. Evaluation metrics used are precision, recall, f-measure, and accuracy. For each classifier, results were summarized and compared with others
    corecore