Search CORE

98 research outputs found

k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

Author: Cunningham Padraig
Delany Sarah Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2020
Field of study

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

arXiv.org e-Print Archive

Arrow@TUDublin

A Bonferroni Mean Based Fuzzy K Nearest Centroid Neighbor Classifier

Author: Arifin Agus Zainal
Indraswari Rarasmaya
Putra Cornelius Bagus Purnama
Widyadhana Arya
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 28/02/2021
Field of study

K-nearest neighbor (KNN) is an effective nonparametric classifier that determines the neighbors of a point based only on distance proximity. The classification performance of KNN is disadvantaged by the presence of outliers in small sample size datasets and its performance deteriorates on datasets with class imbalance. We propose a local Bonferroni Mean based Fuzzy K-Nearest Centroid Neighbor (BM-FKNCN) classifier that assigns class label of a query sample dependent on the nearest local centroid mean vector to better represent the underlying statistic of the dataset. The proposed classifier is robust towards outliers because the Nearest Centroid Neighborhood (NCN) concept also considers spatial distribution and symmetrical placement of the neighbors. Also, the proposed classifier can overcome class domination of its neighbors in datasets with class imbalance because it averages all the centroid vectors from each class to adequately interpret the distribution of the classes. The BM-FKNCN classifier is tested on datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and benchmarked with classification results from the KNN, Fuzzy-KNN (FKNN), BM-FKNN and FKNCN classifiers. The experimental results show that the BM-FKNCN achieves the highest overall average classification accuracy of 89.86% compared to the other four classifiers

Jurnal Ilmu Komputer dan Informasi

Recommended from our members

Pattern recognition techniques applied to rust classification in steel structures

Author: Zhang Yun
Publication venue
Publication date
Field of study

The life and performance of steel structure depends directly upon the steel surface preparation. The restoration of steel structure such as steel bridges, ships and storage tanks is due mainly to the use of manual surface inspection methods accompanied by surface preparation technologies. It requires a long project duration, high costs and hazardous practices for both worker and environment to complete surface restoration. The developments of surface preparation technologies make it essential to develop technologies that allows patch restore of corrode steel structure in practice. This thesis addresses the problem of classification of rust steel surfaces. Various Pattern recognition methods are studied for classifying less subjective steel surfaces from a time corrosion perspective. Our primary contribution is: with appropriate features from the steel surfaces, artificial neural network pattern recognition methods have the abilities to classify the less subjective rust steel surfaces reliably and be suitable for automation. The results provide important information about the classification methods for rust steel surface analysis

City Research Online

POD as tool for comparison of PIV and LES data

Author: Cavar Dalibor
Meyer Knud Erik
Pedersen Jakob Martin
Publication venue: Faculty of Engineering, University “La Sapienza”
Publication date: 01/01/2007
Field of study

Online Research Database In Technology

Estimating Mixture Entropy with Pairwise Distances

Author: Kolchinsky Artemy
Tracey Brendan D.
Publication venue: 'MDPI AG'
Publication date: 01/07/2017
Field of study

Mixture distributions arise in many parametric and non-parametric settings -- for example, in Gaussian mixture models and in non-parametric estimation. It is often necessary to compute the entropy of a mixture, but, in most cases, this quantity has no closed-form expression, making some form of approximation necessary. We propose a family of estimators based on a pairwise distance function between mixture components, and show that this estimator class has many attractive properties. For many distributions of interest, the proposed estimators are efficient to compute, differentiable in the mixture parameters, and become exact when the mixture components are clustered. We prove this family includes lower and upper bounds on the mixture entropy. The Chernoff

\alpha

-divergence gives a lower bound when chosen as the distance function, with the Bhattacharyya distance providing the tightest lower bound for components that are symmetric and members of a location family. The Kullback-Leibler divergence gives an upper bound when used as the distance function. We provide closed-form expressions of these bounds for mixtures of Gaussians, and discuss their applications to the estimation of mutual information. We then demonstrate that our bounds are significantly tighter than well-known existing bounds using numeric simulations. This estimator class is very useful in optimization problems involving maximization/minimization of entropy and mutual information, such as MaxEnt and rate distortion problems.Comment: Corrects several errata in published version, in particular in Section V (bounds on mutual information

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Comparing and Contrasting Clustering Analysis Methods: K-means and Vector in Partition

Author: Sobral Lauren
Publication venue: University of Memphis Digital Commons
Publication date: 27/11/2018
Field of study

This paper delves into the similarities and differences between two methods of exploratory cluster analysis, K-means and Vector in Partition. Known as the traditional clustering approach, K-means does have some limitations when dealing with clustering complex datasets, specifically datasets with variables of multidimensional vectors. This is the gap the Vector in Partition (VIP) algorithm aims to fill. As a novel approach for clustering multidimensional datasets of both continuous and categorical data, the VIP algorithm has preliminary results that support its ability to correctly cluster simulated datasets of the genetic factors, gene expression, DNA methylation, and single nucleotide polymorphisms. After explaining both the K-means algorithm and the VIP algorithm, an example will be presented of simulated genetic data containing variables with multidimensional vectors that will be analyzed with both algorithms. The results will then be summarized using accuracy, sensitivity, and specificity while highlighting the benefits and limitations of each clustering method

University of Memphis Digital Commons