1,649 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Ptolemaic Indexing

    Full text link
    This paper discusses a new family of bounds for use in similarity search, related to those used in metric indexing, but based on Ptolemy's inequality, rather than the metric axioms. Ptolemy's inequality holds for the well-known Euclidean distance, but is also shown here to hold for quadratic form metrics in general, with Mahalanobis distance as an important special case. The inequality is examined empirically on both synthetic and real-world data sets and is also found to hold approximately, with a very low degree of error, for important distances such as the angular pseudometric and several Lp norms. Indexing experiments demonstrate a highly increased filtering power compared to existing, triangular methods. It is also shown that combining the Ptolemaic and triangular filtering can lead to better results than using either approach on its own

    Study and testing of the Falconn++ algorithm

    Get PDF
    openLa classe di problemi di nearest-neighbour search (NNS) è uno degli argomenti più studiati nell’analisi di dati, con applicazioni in numerosi ambiti quali recommendation systems, web search, machine learning e computer vision. Esistono diverse soluzioni efficienti al problema della NNS in basse dimensioni, ma per costruire soluzioni che affrontino il problema in grandi dimensioni, superando così la ”maledizione delle dimensioni”, la ricerca si è diretta verso approcci approssimati. In questa tesi analizziamo Falconn++, un algoritmo di locality-sensitive filtering per risolvere la classe di problemi di nearest-neighbour approssimato in distanza angolare. La caratteristica principale di Falconn++ è un meccanismo di filtraggio basato sulla teoria dei concomitanti dell’estremo ordine (CEOs) che garantisce candidati di miglior qualità rispetto alle precedenti soluzioni basate sull’hashing.The problem of nearest-neighbour search (NNS) is one of the most studied topics of data analysis, seeing application in numerous fields, e.g., recommendation systems, web search, machine learning, and computer vision. There are several efficient solutions to address the problem of NNS in low dimensions, but to overcome the ”curse of dimensionality” and build efficient solutions for high dimensions, researchers have opted to use approximated approaches. This thesis analyses Falconn++, a locality-sensitive filtering algorithm for the problem of approximate nearest-neighbour search on angular distance. The strength of Falconn++ lies in a filtering mechanism based on the theory of concomitants of extreme order statistics (CEOs) that achieves higher quality candidates compared to previous hashing-based solutions

    k-Nearest Neighbour Classifiers - A Tutorial

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.This paper is the second edition of a paper previously published as a technical report . Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods

    Influence of local carrying capacity restrictions on stochastic predator-prey models

    Full text link
    We study a stochastic lattice predator-prey system by means of Monte Carlo simulations that do not impose any restrictions on the number of particles per site, and discuss the similarities and differences of our results with those obtained for site-restricted model variants. In accord with the classic Lotka-Volterra mean-field description, both species always coexist in two dimensions. Yet competing activity fronts generate complex, correlated spatio-temporal structures. As a consequence, finite systems display transient erratic population oscillations with characteristic frequencies that are renormalized by fluctuations. For large reaction rates, when the processes are rendered more local, these oscillations are suppressed. In contrast with site-restricted predator-prey model, we observe species coexistence also in one dimension. In addition, we report results on the steady-state prey age distribution.Comment: Latex, IOP style, 17 pages, 9 figures included, related movies available at http://www.phys.vt.edu/~tauber/PredatorPrey/movies

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page
    • …
    corecore