Search CORE

36,278 research outputs found

Klasifikasi Teks dengan Menggunakan Improved K-Nearest Neighbor Algorithm

Author: Gema Megantara
Publication venue: Universitas Telkom
Publication date: 01/01/2010
Field of study

ABSTRAKSI: Klasifikasi merupakan proses mengelompokkan suatu data ke dalam kelompok data yang telah ditentukan berdasarkan tingkat kemiripannya. Klasifikasi ini pun dapat diterapkan dalam dokumen teks, dengan tujuan mempermudah penentuan seluruh dokumen dengan kategori tertentu. Terdapat berbagai cara untuk melakukan klasifikasi, salah satunya dengan menggunakan metode K-Nearest Neighbor. Metode K-Nearest Neighbor merupakan metode yang populer dalam klasifikasi, karena kemudahan dalam implementasinya.Tetapi dibalik kemudahannya itu metode K-Nearest Neighbor memiliki kelemahan jika digunakan dalam dokumen yang memiliki distribusi yang tidak merata, karena saat nilai k yang digunakan semakin besar akan ada dominasi oleh kelas yang berukuran besar terhadap kelas yang berukuran kecil. Oleh karena itu digunakan metode Improved K-Nearest Neighbor untuk menanggulangi kelemahan tersebut.Untuk mengevaluasi performansi dari K-Nearest Neighbor dan Improved KNearest Neighbor digunakan precision, recall, dan F1-Measure. Hasil yang didapat menunjukkan bahwa metode Improved K-Nearest Neighbor dapat menghilangkan efek dominasi dari kategori terbesar dalam berbagai jenis distribusi dokumen training.Kata Kunci : klasifikasi, K-Nearest Neighbor, Improved K-Nearest NeighborABSTRACT: Classification is the process that grouping the data into the class based on similarity level. Classification can be also applied in text document, to make easier act of determining whole document with certain category. There is a various way to do the classification, one of them is with the K-Nearest Neighbor method. The K-Nearest Neighbor is a popular method in classification because of easy in implementation.But, behind in the easiness, the K-Nearest Neighbor method has a weakness if it be used in a document that has uneven distribution, because when the k value more and more bigger will appear domination by the bigger class to the smaller class. Therefore the Improved K-Nearest Neighbor method has been used for to cope with the weakness.Precision, recall, and F1-Measure are used for evaluating the performance from the K-Nearest Neighbor method and Improved K-Nearest Neighbor method. The result shows that the Improved K-Nearest Neighbor method can eliminate the domination effect from the largest category in various kind of document training distribution.Keyword: classification, K-Nearest Neighbor, Improved K-Nearest Neighbo

Open Library

Improved primary vertex finding for collider detectors

Author: Adam
Andersen
Badala
Bauer
Costa
Dempster
Ferenc Siklér
Franti
Garcia
Kantowski
Lindsey
McLachlan
Press
Re
Sjostrand
van Beuzekom
Publication venue: 'Elsevier BV'
Publication date: 14/11/2009
Field of study

Primary vertex finding for collider experiments is studied. The efficiency and precision of finding interaction vertices can be improved by advanced clustering and classification methods, such as agglomerative clustering with fast pairwise nearest neighbor search, followed by Gaussian mixture model or k-means clustering.Comment: 12 pages, 10 figures, submitted to Nucl. Instrum. Meth.

arXiv.org e-Print Archive

Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM

Author: Singer Amit
Zhao Zhizhen
Publication venue
Publication date: 12/03/2014
Field of study

We introduce a new rotationally invariant viewing angle classification method for identifying, among a large number of Cryo-EM projection images, similar views without prior knowledge of the molecule. Our rotationally invariant features are based on the bispectrum. Each image is denoised and compressed using steerable principal component analysis (PCA) such that rotating an image is equivalent to phase shifting the expansion coefficients. Thus we are able to extend the theory of bispectrum of 1D periodic signals to 2D images. The randomized PCA algorithm is then used to efficiently reduce the dimensionality of the bispectrum coefficients, enabling fast computation of the similarity between any pair of images. The nearest neighbors provide an initial classification of similar viewing angles. In this way, rotational alignment is only performed for images with their nearest neighbors. The initial nearest neighbor classification and alignment are further improved by a new classification method called vector diffusion maps. Our pipeline for viewing angle classification and alignment is experimentally shown to be faster and more accurate than reference-free alignment with rotationally invariant K-means clustering, MSA/MRA 2D classification, and their modern approximations

arXiv.org e-Print Archive

CiteSeerX

Mapping growing stock volume and forest live biomass: a case study of the Polissya region of Ukraine

Author: Bilous A.
Bilous S.
Holiaka D.
Myroniuk V.
Schepaschenko D.
See L.
Publication venue: 'IOP Publishing'
Publication date: 27/09/2017
Field of study

Forest inventory and biomass mapping are important tasks that require inputs from multiple data sources. In this paper we implement two methods for the Ukrainian region of Polissya: random forest (RF) for tree species prediction and k-nearest neighbors (k-NN) for growing stock volume and biomass mapping. We examined the suitability of the five-band RapidEye satellite image to predict the distribution of six tree species. The accuracy of RF is quite high: ~99% for forest/non-forest mask and 89% for tree species prediction. Our results demonstrate that inclusion of elevation as a predictor variable in the RF model improved the performance of tree species classification. We evaluated different distance metrics for the k-NN method, including Euclidean or Mahalanobis distance, most similar neighbor (MSN), gradient nearest neighbor, and independent component analysis. The MSN with the four nearest neighbors (k = 4) is the most precise (according to the root-mean-square deviation) for predicting forest attributes across the study area. The k-NN method allowed us to estimate growing stock volume with an accuracy of 3 m3 ha−1 and for live biomass of about 2 t ha−1 over the study area

International Institute for Applied Systems Analysis (IIASA)