274 research outputs found

    Coresets for the Nearest-Neighbor Rule

    Get PDF
    Given a training set PP of labeled points, the nearest-neighbor rule predicts the class of an unlabeled query point as the label of its closest point in the set. To improve the time and space complexity of classification, a natural question is how to reduce the training set without significantly affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor condensation deals with finding a subset R⊆PR \subseteq P such that for every point p∈Pp \in P, pp's nearest-neighbor in RR has the same label as pp. This relates to the concept of coresets, which can be broadly defined as subsets of the set, such that an exact result on the coreset corresponds to an approximate result on the original set. However, the guarantees of a coreset hold for any query point, and not only for the points of the training set. This paper introduces the concept of coresets for nearest-neighbor classification. We extend existing criteria used for condensation, and prove sufficient conditions to correctly classify any query point when using these subsets. Additionally, we prove that finding such subsets of minimum cardinality is NP-hard, and propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets. Moreover, we show how to improve one of these algorithms to have subquadratic runtime, being the first of this kind for condensation

    Refined Lower Bounds for Nearest Neighbor Condensation

    Get PDF

    Refined Lower Bounds for Nearest Neighbor Condensation

    Get PDF

    Algorithms and Data Structures for Faster Nearest-Neighbor Classification

    Get PDF
    Given a set P of n labeled points in a metric space (X,d), the nearest-neighbor rule classifies an unlabeled query point q ∈ X with the class of q's closest point in P. Despite the advent of more sophisticated techniques, nearest-neighbor classification is still fundamental for many machine-learning applications. Over the years, this~has motivated numerous research aiming to reduce its high dependency on the size and dimensionality of the data. This dissertation presents various approaches to reduce the dependency of the nearest-neighbor rule from n to some smaller parameter k, that describes the intrinsic complexity of the class boundaries of P. This is of particular significance as it is usually assumed that k ≪ n on real-world training sets. One natural way to achieve this dependency reduction is to reduce the training set itself, selecting a subset R ⊆ P to be used by the nearest-neighbor rule~to~answer incoming queries, instead of using P. Evidently, this approach would reduce the dependencies of the nearest-neighbor rule from n, the size of P, to the size of R. This dissertation explores different techniques to select subsets whose sizes are proportional to k, and that provide varying degrees of correct classification guarantees. Another alternative involves bypassing training set reduction, and instead building data structures designed to answer classification queries directly. To this end, this dissertation proposes the Chromatic AVD; a Quadtree-based data structure designed to answer ε-approximate nearest-neighbor classification queries. The query time and space complexities of this data structure depend on k_ε; a generalization of k that describes the intrinsic complexity of the ε-approximate class boundaries of P

    Weighted Distance Nearest Neighbor Condensing

    Full text link
    The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results

    Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

    Get PDF
    The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ?^d, and an approximation parameter 0 < ? ? 1/2, any unlabeled query point should be classified with the class of any of its ?-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ?-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_? of ?-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ?-approximate nearest-neighbor classification whose space and query time complexities are only dependent on ?, k_? and d, while being independent on both n and ?, the spread of P

    The Mathematics of the Bose Gas and its Condensation

    Full text link
    This book surveys results about the quantum mechanical many-body problem of the Bose gas that have been obtained by the authors over the last seven years. These topics are relevant to current experiments on ultra-cold gases; they are also mathematically rigorous, using many analytic techniques developed over the years to handle such problems. Some of the topics treated are the ground state energy, the Gross-Pitaevskii equation, Bose-Einstein condensation, superfluidity, one-dimensional gases, and rotating gases. The book also provides a pedagogical entry into the field for graduate students and researchers.Comment: 213 pages. Slightly revised and extended version of Oberwolfach Seminar Series, Vol. 34, Birkhaeuser (2005
    • …
    corecore