Search CORE

313 research outputs found

Approximate Near Neighbors for General Symmetric Norms

Author: A
Andoni Alexandr
Argyriou Andreas
Beyond
Indyk Piotr
John Fritz
Kenneth
Krahmer Felix
Krauthgamer Robert
McDonald Andrew M.
Time–Space Optimal
Publication venue
Publication date: 24/07/2017
Field of study

We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every

n

d = n^{o(1)}

, and every

d

-dimensional symmetric norm

\|\cdot\|

, there exists a data structure for

\mathrm{poly}(\log \log n)

-approximate nearest neighbor search over

\|\cdot\|

for

n

-point datasets achieving

n^{o(1)}

query time and

n^{1+o(1)}

space. The main technical ingredient of the algorithm is a low-distortion embedding of a symmetric norm into a low-dimensional iterated product of top-

k

norms. We also show that our techniques cannot be extended to general norms.Comment: 27 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Distance-Sensitive Hashing

Author: Alexandr Andoni Piotr Indyk
Andoni Alexandr
Andoni Alexandr
Broder Andrei Z.
Chierichetti Flavio
Cristofaro Emiliano De
Indyk Piotr
Jain Prateek
Liu Wei
Neyshabur Behnam
Rahimi Ali
Riazi M. Sadegh
Savage Richard I.
Shrivastava Anshumali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy or uncertain data, for example in connection with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH framework is not known to yield good solutions, and instead ad hoc solutions have been designed for particular similarity and distance measures. For example, this is true for output-sensitive similarity search/join, and for indexes supporting annulus queries that aim to report a point close to a certain given distance from the query point. In this paper we initiate the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space

(X, \text{dist})

and a "collision probability function" (CPF)

f\colon \mathbb{R}\rightarrow [0,1]

we seek a distribution over pairs of functions

(h,g)

such that for every pair of points

x, y \in X

the collision probability is

\Pr[h(x)=g(y)] = f(\text{dist}(x,y))

. Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. For many spaces,

f

can be made exponentially decreasing even if we restrict attention to the symmetric case where

g=h

. We show that the asymmetry achieved by having a pair of functions makes it possible to achieve CPFs that are, for example, increasing or unimodal, and show how this leads to principled solutions to problems not addressed by the LSH framework. This includes a novel application to privacy-preserving distance estimation. We believe that the DSH framework will find further applications in high-dimensional data management.Comment: Accepted at PODS'18. Abstract shortened due to character limi

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Author: A Andoni
AL Zobrist
AZ Broder
JL Carter
JL Carter
K Terasawa
M Dubiner
MH Overmars
ML Fredman
N Sundaram
P Li
S Har-Peled
T Hagerup
Publication venue
Publication date: 16/02/2018
Field of study

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution

\mathcal{H}

over locality-sensitive hash functions that partition space. For a collection of

n

points, after preprocessing, the query time is dominated by

O(n^{\rho} \log n)

evaluations of hash functions from

\mathcal{H}

and

O(n^{\rho})

hash table lookups and distance computations where

\rho \in (0,1)

is determined by the locality-sensitivity properties of

\mathcal{H}

. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to

O(\log^2 n)

, leaving the query time to be dominated by

O(n^{\rho})

distance computations and

O(n^{\rho} \log n)

additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to

O(n^\rho)

.Comment: 15 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Taylor Polynomial Estimator for Estimating Frequency Moments

Author: A Andoni
M Charikar
N Alon
P Indyk
R Singh
S Ganguly
Y Li
Publication venue
Publication date: 03/06/2015
Field of study

We present a randomized algorithm for estimating the

p

th moment

F_p

of the frequency vector of a data stream in the general update (turnstile) model to within a multiplicative factor of

1 \pm \epsilon

, for

p > 2

, with high constant confidence. For

0 < \epsilon \le 1

, the algorithm uses space

O( n^{1-2/p} \epsilon^{-2} + n^{1-2/p} \epsilon^{-4/p} \log (n))

words. This improves over the current bound of

O(n^{1-2/p} \epsilon^{-2-4/p} \log (n))

words by Andoni et. al. in \cite{ako:arxiv10}. Our space upper bound matches the lower bound of Li and Woodruff \cite{liwood:random13} for

\epsilon = (\log (n))^{-\Omega(1)}

and the lower bound of Andoni et. al. \cite{anpw:icalp13} for

\epsilon = \Omega(1)

.Comment: Supercedes arXiv:1104.4552. Extended Abstract of this paper to appear in Proceedings of ICALP 201

arXiv.org e-Print Archive

Crossref

On the segmentation and classification of hand radiographs

Author: Acheson R. M.
Andersen E.
ANDONI TOMS
ANTHONY BAGNALL
BARRY-JOHN THEOBALD
Canny J.
Gilsanz V.
Greulich W.
Greulich W. W.
Hsieh C. W.
JASON LINES
LUKE M. DAVIS
Niemeijer M.
Otsu N.
Rotch T. M.
Schmeling A.
Tadeusiewicz R.
Tanner J. M.
Tanner J. M.
Todd T. W.
Zielinski B.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 24/08/2012
Field of study

This research is part of a wider project to build predictive models of bone age using hand radiograph images. We examine ways of finding the outline of a hand from an X-ray as the first stage in segmenting the image into constituent bones. We assess a variety of algorithms including contouring, which has not previously been used in this context. We introduce a novel ensemble algorithm for combining outlines using two voting schemes, a likelihood ratio test and dynamic time warping (DTW). Our goal is to minimize the human intervention required, hence we investigate alternative ways of training a classifier to determine whether an outline is in fact correct or not. We evaluate outlining and classification on a set of 1370 images. We conclude that ensembling with DTW improves performance of all outlining algorithms, that the contouring algorithm used with the DTW ensemble performs the best of those assessed, and that the most effective classifier of hand outlines assessed is a random forest applied to outlines transformed into principal components

Crossref

University of East Anglia digital repository

Development of a Low-Cost Optical Sensor to Detect Eutrophication in Irrigation Reservoirs

Author: Basterrechea-Chertudi Daniel Andoni
Jimenez Jose M.
Lloret Jaime
Parra-Boronat Lorena
Rocher-Morant Javier
Publication venue: 'MDPI AG'
Publication date: 01/11/2021
Field of study

[EN] In irrigation ponds, the excess of nutrients can cause eutrophication, a massive growth of microscopic algae. It might cause different problems in the irrigation infrastructure and should be monitored. In this paper, we present a low-cost sensor based on optical absorption in order to determine the concentration of algae in irrigation ponds. The sensor is composed of 5 LEDs with different wavelengths and light-dependent resistances as photoreceptors. Data are gathered for the calibration of the prototype, including two turbidity sources, sediment and algae, including pure samples and mixed samples. Samples were measured at a different concentration from 15 mg/L to 4000 mg/L. Multiple regression models and artificial neural networks, with a training and validation phase, are compared as two alternative methods to classify the tested samples. Our results indicate that using multiple regression models, it is possible to estimate the concentration of alga with an average absolute error of 32.0 mg/L and an average relative error of 11.0%. On the other hand, it is possible to classify up to 100% of the samples in the validation phase with the artificial neural network. Thus, a novel prototype capable of distinguishing turbidity sources and two classification methodologies, which can be adapted to different node features, are proposed for the operation of the developed prototype.This work is partially funded by the Ministerio de Educacion, Cultura y Deporte through the"Ayudas para contratacion pre-doctoral de Formacion del Profesorado Universitario FPU (Convocatoria 2016)" grant number FPU16/05540 and by the Conselleria de Educacion, Cultura y Deporte through the "Subvenciones para la contratacion de personal investigador en fase postdoctoral", grant number APOSTD/2019/04.Rocher-Morant, J.; Parra-Boronat, L.; Jimenez, JM.; Lloret, J.; Basterrechea-Chertudi, DA. (2021). Development of a Low-Cost Optical Sensor to Detect Eutrophication in Irrigation Reservoirs. Sensors. 21(22):1-20. https://doi.org/10.3390/s21227637S120212

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

RiuNet

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

Author: Andoni A.
Beyer K.
Broder A. Z.
Brown P. F.
Fried D.
Le Q.
Mikolov T.
Mu Y.
Muja M.
Petrović S.
Riezler S.
Salton G.
Wang J.
Weber R.
Yang L.
Yao X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2016
Field of study

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

arXiv.org e-Print Archive

Crossref

Scipedia

Pregabalin versus SSRIs and SNRIs in benzodiazepine-refractory outpatients with generalized anxiety disorder: a post hoc cost-effectiveness analysis in usual medical practice in Spain

Author: Barrueta Andoni
Carrasco Jose L
De Salas-Cansado Marina
Olivares José M
Rejas Javier
Álvarez Enrique
Publication venue: Dove Medical Press
Publication date: 01/01/2012
Field of study

Crossref

RUNA - Repositorio de Saúde

PubMed Central

Hardness of Approximate Nearest Neighbor Search

Author: A
Abboud Amir
Ahle Thomas Dybdahl
Alman Josh
Andoni Alexandr
Andoni Alexandr
Arya Sunil
Arya Sunil
Chan Timothy M.
Difference Between Closest On
Fast
Klauck Hartmut
Lower
Oblivious
Optimal
Patrascu Mihai
Shamos Michael Ian
Publication venue
Publication date: 02/03/2018
Field of study

We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every

\delta>0

there exists a constant

\epsilon>0

such that computing a

(1+\epsilon)

-approximation to the Bichromatic Closest Pair requires

n^{2-\delta}

time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

arXiv.org e-Print Archive

Crossref

The prevalence of axial spondyloarthritis in the UK: a cross-sectional cohort study

Author: Alex Macgregor
Andoni Toms
CH Roux
D Poddubnyy
E Collantes
E Tomero
Edward Pinch
F Costantino
J Sieper
JH Kellgren
JS Lawrence
Karl Gaffney
L Hamilton
L Hamilton
L Hoeven Van
Louise Hamilton
M Dougados
M Rojas-Vargas
M Rudwaleit
MR Underwood
R Angelis De
S Linden Van der
SZ Aydin
Victoria Warmington
W Mau
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Background: Accurate prevalence data are important when interpreting diagnostic tests and planning for the health needs of a population, yet no such data exist for axial spondyloarthritis (axSpA) in the UK. In this cross-sectional cohort study we aimed to estimate the prevalence of axSpA in a UK primary care population. Methods: A validated self-completed questionnaire was used to screen primary care patients with low back pain for inflammatory back pain (IBP). Patients with a verifiable pre-existing diagnosis of axSpA were included as positive cases. All other patients meeting the Assessment of SpondyloArthritis international Society (ASAS) IBP criteria were invited to undergo further assessment including MRI scanning, allowing classification according to the European Spondyloarthropathy Study Group (ESSG) and ASAS axSpA criteria, and the modified New York (mNY) criteria for ankylosing spondylitis (AS). Results: Of 978 questionnaires sent to potential participants 505 were returned (response rate 51.6 %). Six subjects had a prior diagnosis of axSpA, 4 of whom met mNY criteria. Thirty eight of 75 subjects meeting ASAS IBP criteria attended review (mean age 53.5 years, 37 % male). The number of subjects satisfying classification criteria was 23 for ESSG, 3 for ASAS (2 clinical, 1 radiological) and 1 for mNY criteria. This equates to a prevalence of 5.3 % (95 % CI 4.0, 6.8) using ESSG, 1.3 % (95 % CI 0.8, 2.3) using ASAS, 0.66 % (95 % CI 0.28, 1.3) using mNY criteria in chronic back pain patients, and 1.2 % (95 % CI 0.9, 1.4) using ESSG, 0.3 % (95 % CI 0.13, 0.48) using ASAS, 0.15 % (95 % CI 0.02, 0.27) using mNY criteria in the general adult primary care population. Conclusions: These are the first prevalence estimates for axSpA in the UK, and will be of importance in planning for the future healthcare needs of this population. Trial registration: Current Controlled Trials ISRCTN7687321

Crossref

Springer - Publisher Connector

PubMed Central

University of East Anglia digital repository