176 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Extensions to rank-based prototype selection in k-Nearest Neighbour classification

    Get PDF
    The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.This work is supported by the Spanish Ministry HISPAMUS project TIN2017-86576-R, partially funded by the EU

    k-Nearest Neighbour Classifiers - A Tutorial

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.This paper is the second edition of a paper previously published as a technical report . Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods

    Space Object Identification Using Feature Space Trajectory Neural Networks

    Get PDF
    The Feature Space Trajectory Neural Network (FSTNN) is a simple yet powerful pattern recognition tool developed by Neiberg and Casasent for use in an Automatic Target Recognition System. Since the FSTNN was developed, it has been used on various problems including speaker identification and space object identification. However, in these types of problems, the test set represents time series data rather than an independent set of points. Since the distance metric of the standard FSTNN treats each test point independently without regard to its position in the sequence, the FSTNN can yield less than optimal results in these problems. Two methods for incorporating sequence information into the FSTNN algorithm are presented. These methods, Dynamic Time Warping (DTW) and Uniform Time Warping (UTW), are described and compared to the standard FSTNN performance on the space object identification problem. Both reduce error induced by improper synchronization of the test and training sequences and make the FSTNN more generally applicable to a wide variety of pattern recognition problems. They incorporate sequencing information by synchronizing the test and training trajectories. DTW accomplishes this \u27on-the-fly\u27 as the sequence progresses while UTW uniformly compensates for temporal differences across the trajectories. These algorithms improve the maximum probability of false alarm (PFA) of the standard FSTNN by an average of 10.18% and 27.69%, respectively, although UTW is less consistent in its results. A metric for determining the saliency of the features in an FSTNN is also presented and demonstrated

    A survey on pre-processing techniques: relevant issues in the context of environmental data mining

    Get PDF
    One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

    A new enhancement of the k-NN algorithm by Using an optimization technique

    Get PDF
    Of a number of ML (Machine Learning) algorithms, k-nearest neighbour (KNN) is among the most common for data classification research, and classifying diseases and faults, which is essential due to frequent alterations in the training dataset, in which it would be expensive using most methods to construct a different classifier every time this happens. Therefore, KNN can be used effectively as it does not require a residual classifier to be constructed in advance. KNN offers ease of use and can be applied across a broad variation spectrum. Here, a novel KNN classification approach is put forward using the Bayesian Optimization Algorithm (BOA) for optimisation. This paper seeks to make classification more accurate and suggest alterations of nearest neighbour K value to use information about dataset structure and the similarity measure of distance. The findings of experimental work based on the University of California Irvine (UCI) repository datasets in general shows improved performance of classifiers compared with conventional KNN and give greater reliability without a significant time cost to speed

    Constructing Geometries for Group Control: Methods for Reasoning about Social Behaviors

    Get PDF
    Social behaviors in groups has been the subjects of hundreds of studies in a variety of research disciplines, including biology, physics, and robotics. In particular, flocking behaviors (commonly exhibited by birds and fish) are widely considered archetypical social behavioris and are due, in part, to the local interactions among the individuals and the environment. Despite a large number of investigations and a significant fraction of these providing algorithmic descriptions of flocking models, incompleteness and imprecision are readily identifiable in these algorithms, algorithmic input, and validation of the models. This has led to a limited understanding of the group level behaviors. Through two case-studies and a detailed meta-study of the literature, this dissertation shows that study of the individual behaviors are not adequate for understanding the behaviors displayed by the group. To highlight the limitations in only studying the individuals, this dissertation introduces a set of tools, that together, unify many of the existing microscopic approaches. A meta-study of the literature using these tools reveal that there are many small differences and ambiguities in the flocking scenarios being studied by different researchers and domains; unfortunately, these differences are of considerable significance. To address this issue, this dissertation exploits the predictable nature of the group’s behaviors in order to control the given group and thus hope to gain a fuller understanding of the collective. From the current literature, it is clear the environment is an important determinant in the resulting collective behaviors. This dissertation presents a method for reasoning about the effects the geometry of an environment has on individuals that exhibit collective behaviors in order to control them. This work formalizes the problem of controlling such groups by means of changing the environment in which the group operates and shows this problem to be PSPACE-Hard. A general methodology and basic framework is presented to address this problem. The proposed approach is general in that it is agnostic to the individual’s behaviors and geometric representations of the environment; allowing for a large variety in groups, desired behaviors, and environmental constraints to be considered. The results from both the simulations and over 80 robot trials show (1) the solution can automatically generate environments for reliably controlling various groups and (2) the solution can apply to other application domains; such as multi-agent formation planning for shepherding and piloting applications

    Estudio de métodos de selección de instancias

    Get PDF
    En la tesis se ha realizado un estudio de las técnicas de selección de instancias: analizando el estado del arte y desarrollando nuevos métodos para cubrir algunas áreas que no habían recibido la debida atención hasta el momento. Los dos primeros capítulos presentan nuevos métodos de selección de instancias para regresión, un tema poco estudiado hasta la fecha en la literatura. El tercer capítulo, estudia la posibilidad de cómo la combinación de algoritmos de selección de instancias para regresión ofrece mejores resultados que los métodos por sí mismos. El último de los capítulos presenta una novedosa idea: la utilización de las funciones hash localmente sensibles para diseñar dos nuevos algoritmos de selección de instancias para clasificación. La ventaja que presenta esta solución, es que ambos algoritmos tienen complejidad lineal. Los resultados de esta tesis han sido publicados en cuatro artículos en revistas JCR del primer cuartil.Ministerio de Economía, Industria y Competitividad, la Junta de Castilla y León y el Fondo Europeo para el Desarrollo Regional, proyectos TIN 2011-24046, TIN 2015-67534-P (MINECO/FEDER) y BU085P17 (JCyL/FEDER
    corecore