Search CORE

274 research outputs found

Coresets for the Nearest-Neighbor Rule

Author: Flores-Velazco Alejandro
Mount David M.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual European Symposium on Algorithms (ESA 2020)
Publication date: 01/01/2020
Field of study

Given a training set

P

of labeled points, the nearest-neighbor rule predicts the class of an unlabeled query point as the label of its closest point in the set. To improve the time and space complexity of classification, a natural question is how to reduce the training set without significantly affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor condensation deals with finding a subset

R \subseteq P

such that for every point

p \in P

p

's nearest-neighbor in

R

has the same label as

p

. This relates to the concept of coresets, which can be broadly defined as subsets of the set, such that an exact result on the coreset corresponds to an approximate result on the original set. However, the guarantees of a coreset hold for any query point, and not only for the points of the training set. This paper introduces the concept of coresets for nearest-neighbor classification. We extend existing criteria used for condensation, and prove sufficient conditions to correctly classify any query point when using these subsets. Additionally, we prove that finding such subsets of minimum cardinality is NP-hard, and propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets. Moreover, we show how to improve one of these algorithms to have subquadratic runtime, being the first of this kind for condensation

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Refined Lower Bounds for Nearest Neighbor Condensation

Author: Chitnis Rajesh
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/04/2022
Field of study

University of Birmingham Research Portal

Refined Lower Bounds for Nearest Neighbor Condensation

Author: Chitnis Rajesh
Publication venue: Proceedings of Machine Learning Research
Publication date: 01/04/2022
Field of study

University of Birmingham Research Portal

Algorithms and Data Structures for Faster Nearest-Neighbor Classification

Author: Flores Velazco Alejandro
Publication venue
Publication date: 01/01/2022
Field of study

Given a set P of n labeled points in a metric space (X,d), the nearest-neighbor rule classifies an unlabeled query point q ∈ X with the class of q's closest point in P. Despite the advent of more sophisticated techniques, nearest-neighbor classification is still fundamental for many machine-learning applications. Over the years, this~has motivated numerous research aiming to reduce its high dependency on the size and dimensionality of the data. This dissertation presents various approaches to reduce the dependency of the nearest-neighbor rule from n to some smaller parameter k, that describes the intrinsic complexity of the class boundaries of P. This is of particular significance as it is usually assumed that k ≪ n on real-world training sets. One natural way to achieve this dependency reduction is to reduce the training set itself, selecting a subset R ⊆ P to be used by the nearest-neighbor rule~to~answer incoming queries, instead of using P. Evidently, this approach would reduce the dependencies of the nearest-neighbor rule from n, the size of P, to the size of R. This dissertation explores different techniques to select subsets whose sizes are proportional to k, and that provide varying degrees of correct classification guarantees. Another alternative involves bypassing training set reduction, and instead building data structures designed to answer classification queries directly. To this end, this dissertation proposes the Chromatic AVD; a Quadtree-based data structure designed to answer ε-approximate nearest-neighbor classification queries. The query time and space complexities of this data structure depend on k_ε; a generalization of k that describes the intrinsic complexity of the ε-approximate class boundaries of P

Digital Repository at the University of Maryland

Weighted Distance Nearest Neighbor Condensing

Author: Gottlieb Lee-Ad
Sharabi Timor
Weiss Roi
Publication venue
Publication date: 24/10/2023
Field of study

The problem of nearest neighbor condensing has enjoyed a long history of study, both in its theoretical and practical aspects. In this paper, we introduce the problem of weighted distance nearest neighbor condensing, where one assigns weights to each point of the condensed set, and then new points are labeled based on their weighted distance nearest neighbor in the condensed set. We study the theoretical properties of this new model, and show that it can produce dramatically better condensing than the standard nearest neighbor rule, yet is characterized by generalization bounds almost identical to the latter. We then suggest a condensing heuristic for our new problem. We demonstrate Bayes consistency for this heuristic, and also show promising empirical results

arXiv.org e-Print Archive

Boundary-Sensitive Approach for Approximate Nearest-Neighbor Classification

Author: Flores-Velazco Alejandro
Mount David M.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ?^d, and an approximation parameter 0 < ? ? 1/2, any unlabeled query point should be classified with the class of any of its ?-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ?-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_? of ?-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ?-approximate nearest-neighbor classification whose space and query time complexities are only dependent on ?, k_? and d, while being independent on both n and ?, the spread of P

Dagstuhl Research Online Publication Server

The Mathematics of the Bose Gas and its Condensation

Author: Lieb Elliott H.
Seiringer Robert
Solovej Jan Philip
Yngvason Jakob
Publication venue
Publication date: 01/01/2005
Field of study

This book surveys results about the quantum mechanical many-body problem of the Bose gas that have been obtained by the authors over the last seven years. These topics are relevant to current experiments on ultra-cold gases; they are also mathematically rigorous, using many analytic techniques developed over the years to handle such problems. Some of the topics treated are the ground state energy, the Gross-Pitaevskii equation, Bose-Einstein condensation, superfluidity, one-dimensional gases, and rotating gases. The book also provides a pedagogical entry into the field for graduate students and researchers.Comment: 213 pages. Slightly revised and extended version of Oberwolfach Seminar Series, Vol. 34, Birkhaeuser (2005

arXiv.org e-Print Archive

CERN Document Server