22 research outputs found
Enhance the Accuracy of k-Nearest Neighbor (k-NN) for Unbalanced Class Data Using Synthetic Minority Oversampling Technique (SMOTE) and Gain Ratio (GR)
k-Nearest Neighbor (k-NN) has very good accuracy results on data with almost the same class distribution, but on the contrary for information whose class distribution is not the same, the accuracy of k-NN will generally be lower. In addition, k-NN does not separate information for each class, implying that each class has an equal influence in determining the new information class, so it is important to choose a class that generally applies to information before characterizing the class assignments process. To overcome this problem, we will propose a structure that uses the Synthetic Minority Oversampling Technique (SMOTE) strategy to address class distribution problems and Gain Ratio (GR) to perform attribute selection to generate a new dataset with a reasonable class spread and significant class information attributes. E-Coli and Glass Identification were among the datasets used in this review. For objective results, the 10-fold-cross validation method will be used as an evaluation method with k values 1 to 10. The results of the research prove that SMOTE and GR can increase the accuracy of the k-NN method, where the highest increase occurred in the Glass Identification dataset by a difference increase of 18.5%. The lowest increase in accuracy occurred in the E-Coli dataset with an increase of 11.4%. The overall proposed method has given better performance, although the value of precision, recall, and F1-Score is not better than original k-NN when used in dataset E-Coli. To all datasets, an improvement from precision is 41.0%, recall is 43.4% and F1-Score is 41.5%
Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data
The k-Nearest Neighbor (kNN) classification approach is conceptually simple -
yet widely applied since it often performs well in practical applications.
However, using a global constant k does not always provide an optimal solution,
e.g., for datasets with an irregular density distribution of data points. This
paper proposes an adaptive kNN classifier where k is chosen dynamically for
each instance (point) to be classified, such that the expected accuracy of
classification is maximized. We define the expected accuracy as the accuracy of
a set of structurally similar observations. An arbitrary similarity function
can be used to find these observations. We introduce and evaluate different
similarity functions. For the evaluation, we use five different classification
tasks based on geo-spatial data. Each classification task consists of (tens of)
thousands of items. We demonstrate, that the presented expected accuracy
measures can be a good estimator for kNN performance, and the proposed adaptive
kNN classifier outperforms common kNN and previously introduced adaptive kNN
algorithms. Also, we show that the range of considered k can be significantly
reduced to speed up the algorithm without negative influence on classification
accuracy
Peningkatan Akurasi K-Nearest Neighbor Pada Data Index Standar Pencemaran Udara Kota Pekanbaru
kNN adalah salah satu metode yang popular karena mudah dieksploitasi, generalisasi yang biak, mudah dimengerti, kemampuan beradaptasi ke ruang fitur yang rumit, intuitif, atraktif, efektif, flexibility, mudah diterapkan, sederhana dan memiliki hasil akurasi yang cukup baik. Namun kNN memiliki beberapa kelemahan, diantaranya memberikan bobot yang sama pada setiap attribut sehingga attribut yang tidak relevant juga memberikan dampak yang sama dengan attribut yang relevant terhadap kemiripan antar data. Masalah lain dari kNN adalah pemilihan tetangga terdekat dengan system suara terbanyak, dimana system ini mengabaikan kemiripan setiap tetangga terdekat dan kemungkinan munculnya mayoritas ganda serta kemungkinan terpilihnya outlier sebagai tetangga terdekat. Masalah-masalah tersebut tentu saja dapat menimbulkan kesalahan klasifikasi yang mengakibatkan rendahnya akurasi. Pada penelitian kali ini akan dilakukan peningkatan akurasi dari kNN tersebut dalam melakukan klasifikasi terhadap data Index Standar Pencemaran Udara di Pekanbaru dengan menggunakan pembobotan attribut (Attibute Weighting) dan local mean. Adapun hasil dari penelitian ini didapati bahwa metode yang diusulkan mampu untuk meningkatkan akurasi sebesar 2.42% dengan rata-rata tingkat akurasi sebesar 97.09%
Perbandingan Kinerja k-Nearest Neighbor dan Local Mean Distance k-Nearest Neighbor Pada Data Citra Covid-19
Corona Virus Disease 2019 (covid-19) merupakan pandemi dunia yang menimbulkan berbagai kerugian di dunia. Diagnosis covid-19 yang valid memerlukan waktu yang cukup lama dan hasil ini tidak sepenuhnya akurat. Salah satu cara untuk meningkatkan hasil akurasi adalah dengan menggunakan image classification. k-Nearest Neighbor (kNN) adalah salah satu Teknik klasifikasi yang paling banyak digunakan untuk melakukan pekerjaan tersebut, hanya saja kNN masih memiliki kelemahan. Untuk mengatasi kelemahan pada kNN, maka dilakukan modifikasi dengan menambahkan local mean dan distance weight, di mana varian kNN ini dikenal dengan nama Local Mean Distance Weight k-Nearest Neighbor (LMDWkNN). Oleh sebab itu, penelitian kali mencoba membandingkan kinerja kedua algoritma ini untuk melakukan image classification pada citra covid-19. Kinerja diukur berdasarkan nilai akurasi, precision, dan recall, di mana dari hasil pengujian terbukti bahwa kinerja LMDWkNN lebih baik dari pada kinerja kNN. Algoritma LMDWkNN mengalami peningkatan rata-rata sebesar 3.5% untuk nilai akurasi, 2.89% untuk precision, dan 3.54% untuk recall. Meskipun begitu kNN tetap mampu menghasilkan kinerja yang sama baiknya yang mana kinerja kNN akan sangat bergantung dari nilai k yang digunaka
Machine Learning and Job Posting Classification: A Comparative Study
In this paper, we investigated multiple machine learning classifiers which are, Multinomial Naive Bayes, Support
Vector Machine, Decision Tree, K Nearest Neighbors, and Random Forest in a text classification problem. The data we used
contains real and fake job posts. We cleaned and pre-processed our data, then we applied TF-IDF for feature extraction. After we
implemented the classifiers, we trained and evaluated them. Evaluation metrics used are precision, recall, f-measure, and
accuracy. For each classifier, results were summarized and compared with others