299 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Scalable Object-Class Search via Sparse Retrieval Models and Approximate Ranking

    Get PDF
    In this paper we address the problem of object-class retrieval in large image data sets: given a small set of training examples defining a visual category, the objective is to efficiently retrieve images of the same class from a large database. We propose two contrasting retrieval schemes achieving good accuracy and high efficiency. The first exploits sparse classification models expressed as linear combinations of a small number of features. These sparse models can be efficiently evaluated using inverted file indexing. Furthermore, we introduce a novel ranking procedure that provides a significant speedup over inverted file indexing when the goal is restricted to finding the top-k (i.e., the k highest ranked) images in the data set. We contrast these sparse retrieval models with a second scheme based on approximate ranking using vector quantization. Experimental results show that our algorithms for object-class retrieval can search a 10 million database in just a couple of seconds and produce categorization accuracy comparable to the best known class-recognition systems

    Content-based Information Retrieval via Nearest Neighbor Search

    Get PDF
    Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process. On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data\u27s similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches. On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK
    • …
    corecore