1,286 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Implementation and Scalability Analysis of Balancing Domain Decomposition Methods

    Get PDF
    In this paper we present a detailed description of a high-performance distributed-memory implementation of balancing domain decomposition preconditioning techniques. This coverage provides a pool of implementation hints and considerations that can be very useful for scientists that are willing to tackle large-scale distributed-memory machines using these methods. On the other hand, the paper includes a comprehensive performance and scalability study of the resulting codes when they are applied for the solution of the Poisson problem on a large-scale multicore-based distributed-memory machine with up to 4096 cores. Well-known theoretical results guarantee the optimality (algorithmic scalability) of these preconditioning techniques for weak scaling scenarios, as they are able to keep the condition number of the preconditioned operator bounded by a constant with fixed load per core and increasing number of cores. The experimental study presented in the paper complements this mathematical analysis and answers how far can these methods go in the number of cores and the scale of the problem to still be within reasonable ranges of efficiency on current distributed-memory machines. Besides, for those scenarios where poor scalability is expected, the study precisely identifies, quantifies and justifies which are the main sources of inefficiency

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    Instance selection of linear complexity for big data

    Get PDF
    Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).Supported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness

    Biometric Authentication using Nonparametric Methods

    Full text link
    The physiological and behavioral trait is employed to develop biometric authentication systems. The proposed work deals with the authentication of iris and signature based on minimum variance criteria. The iris patterns are preprocessed based on area of the connected components. The segmented image used for authentication consists of the region with large variations in the gray level values. The image region is split into quadtree components. The components with minimum variance are determined from the training samples. Hu moments are applied on the components. The summation of moment values corresponding to minimum variance components are provided as input vector to k-means and fuzzy kmeans classifiers. The best performance was obtained for MMU database consisting of 45 subjects. The number of subjects with zero False Rejection Rate [FRR] was 44 and number of subjects with zero False Acceptance Rate [FAR] was 45. This paper addresses the computational load reduction in off-line signature verification based on minimal features using k-means, fuzzy k-means, k-nn, fuzzy k-nn and novel average-max approaches. FRR of 8.13% and FAR of 10% was achieved using k-nn classifier. The signature is a biometric, where variations in a genuine case, is a natural expectation. In the genuine signature, certain parts of signature vary from one instance to another. The system aims to provide simple, fast and robust system using less number of features when compared to state of art works.Comment: 20 page

    The Hybrid Dynamic Prototype Construction and Parameter Optimization with Genetic Algorithm for Support Vector Machine

    Get PDF
    The optimized hybrid artificial intelligence model is a potential tool to deal with construction engineering and management problems. Support vector machine (SVM) has achieved excellent performance in a wide variety of applications. Nevertheless, how to effectively reduce the training complexity for SVM is still a serious challenge. In this paper, a novel order-independent approach for instance selection, called the dynamic condensed nearest neighbor (DCNN) rule, is proposed to adaptively construct prototypes in the training dataset and to reduce the redundant or noisy instances in a classification process for the SVM. Furthermore, a hybrid model based on the genetic algorithm (GA) is proposed to simultaneously optimize the prototype construction and the SVM kernel parameters setting to enhance the classification accuracy. Several UCI benchmark datasets are considered to compare the proposed hybrid GA-DCNN-SVM approach with the previously published GA-based method. The experimental results illustrate that the proposed hybrid model outperforms the existing method and effectively improves the classification performance for the SVM
    corecore