Search CORE

372 research outputs found

Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

Author: Andoni Alexandr
Laarhoven Thijs
Razenshteyn Ilya
Waingarten Erik
Publication venue
Publication date: 01/01/2016
Field of study

We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space

\mathbb{R}^d

using the techniques from [Andoni, Razenshteyn 2015]). We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1511.07527 [cs.DS]

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

Author: Andoni Alexandr
Klein Philip N.
Laarhoven Thijs
Razenshteyn Ilya
Waingarten Erik
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

[See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the

c

-Approximate Near Neighbor Search problem. For the

d

-dimensional Euclidean space and

n

-point datasets, we develop a data structure with space

n^{1 + \rho_u + o(1)} + O(dn)

and query time

n^{\rho_q + o(1)} + d n^{o(1)}

for every

\rho_u, \rho_q \geq 0

such that: \begin{equation} c^2 \sqrt{\rho_q} + (c^2 - 1) \sqrt{\rho_u} = \sqrt{2c^2 - 1}. \end{equation} This is the first data structure that achieves sublinear query time and near-linear space for every approximation factor

c > 1

, improving upon [Kapralov, PODS 2015]. The data structure is a culmination of a long line of work on the problem for all space regimes; it builds on Spherical Locality-Sensitive Filtering [Becker, Ducas, Gama, Laarhoven, SODA 2016] and data-dependent hashing [Andoni, Indyk, Nguyen, Razenshteyn, SODA 2014] [Andoni, Razenshteyn, STOC 2015]. Our matching lower bounds are of two types: conditional and unconditional. First, we prove tightness of the whole above trade-off in a restricted model of computation, which captures all known hashing-based approaches. We then show unconditional cell-probe lower bounds for one and two probes that match the above trade-off for

\rho_q = 0

, improving upon the best known lower bounds from [Panigrahy, Talwar, Wieder, FOCS 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than the one-probe bound. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.Comment: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version contains more elaborated proofs and fixed some typo

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data

Author: Kulkarni Amogh Pramod
T. N. Manjunath
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/01/2023
Field of study

Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works

International Journal on Recent and Innovation Trends in Computing and Communication

Hardness of Approximate Nearest Neighbor Search

Author: A
Abboud Amir
Ahle Thomas Dybdahl
Alman Josh
Andoni Alexandr
Andoni Alexandr
Arya Sunil
Arya Sunil
Chan Timothy M.
Difference Between Closest On
Fast
Klauck Hartmut
Lower
Oblivious
Optimal
Patrascu Mihai
Shamos Michael Ian
Publication venue
Publication date: 02/03/2018
Field of study

We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every

\delta>0

there exists a constant

\epsilon>0

such that computing a

(1+\epsilon)

-approximation to the Bichromatic Closest Pair requires

n^{2-\delta}

time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

arXiv.org e-Print Archive

Crossref

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

Author: Song Zhao
Xu Zhaozhuo
Yang Yuanyuan
Zhang Lichen
Publication venue
Publication date: 18/07/2022
Field of study

In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time

\tilde O(nd^{\omega-1}+dn^{1+o(1)})

and per iteration cost

\tilde O(d+n^\rho)

for small constant

\rho

. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time

\tilde O(nd)

and per iteration cost

\tilde O(d+n)

. The first algorithm improves the state-of-the-art (with preprocessing time

\tilde O(d^2n^{1+o(1)})

and per iteration cost

\tilde O(dn^\rho)

) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small

arXiv.org e-Print Archive

A\c{C}AI: Ascent Similarity Caching with Approximate Indexes

Author: Carra Damiano
Neglia Giovanni
Salem Tareq Si
Publication venue
Publication date: 01/01/2021
Field of study

Similarity search is a key operation in multimedia retrieval systems and recommender systems, and it will play an important role also for future machine learning and augmented reality applications. When these systems need to serve large objects with tight delay constraints, edge servers close to the end-user can operate as similarity caches to speed up the retrieval. In this paper we present A\c{C}AI, a new similarity caching policy which improves on the state of the art by using (i) an (approximate) index for the whole catalog to decide which objects to serve locally and which to retrieve from the remote server, and (ii) a mirror ascent algorithm to update the set of local objects with strong guarantees even when the request process does not exhibit any statistical regularity

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Catalogo dei prodotti della ricerca

Structure in the 3D Galaxy Distribution: I. Methods and Example Results

Author: Abazajian
Abazajian
Adelman-McCarthy
Andersen
Barrow
Blanton
Blanton
Blanton
Bok
Cappellari
Choi
Connolly
Cowan
Croft
Daley
Daley
de Berg
de Vaucouleurs
de Vaucouleurs
DeSieno
Doroshkevich
Efstathiou
Einasto
Gazis
Gomez
Gott
Gray
Gray
Hogg
Holmberg
Hubble
Hubble
Icke
Ikeuchi
Ivezić
Ivezić
Jackson
Jeffrey D. Scargle
Kim
Kohonen
Krzewina
Kutoyants
M. J. Way
Martinez
Melnyk
Merényi
Messier
Moore
Neyman
Neyman
Okabe
P. R. Gazis
Papoulis
Paredes
Pearson
Peebles
Preparata
Ramella
Reiz
Ritter
Saslaw
Scargle
Scargle
Schaap
Schaap
Schlegel
Shandarin
Shane
Silverman
Slezak
Snyder
Soares-Santos
Sousbie
Sousbie
Stein
Stoyan
Strauss
Szapudi
Totsuji
Ueda
van de Weygaert
van de Weygaert
van de Weygaert
Wright
York
Zehavi
Zehavi
Zel'dovich
Zhang
Zwicky
Publication venue: 'IOP Publishing'
Publication date: 02/12/2010
Field of study

Three methods for detecting and characterizing structure in point data, such as that generated by redshift surveys, are described: classification using self-organizing maps, segmentation using Bayesian blocks, and density estimation using adaptive kernels. The first two methods are new, and allow detection and characterization of structures of arbitrary shape and at a wide range of spatial scales. These methods should elucidate not only clusters, but also the more distributed, wide-ranging filaments and sheets, and further allow the possibility of detecting and characterizing an even broader class of shapes. The methods are demonstrated and compared in application to three data sets: a carefully selected volume-limited sample from the Sloan Digital Sky Survey redshift data, a similarly selected sample from the Millennium Simulation, and a set of points independently drawn from a uniform probability distribution -- a so-called Poisson distribution. We demonstrate a few of the many ways in which these methods elucidate large scale structure in the distribution of galaxies in the nearby Universe.Comment: Re-posted after referee corrections along with partially re-written introduction. 80 pages, 31 figures, ApJ in Press. For full sized figures please download from: http://astrophysics.arc.nasa.gov/~mway/lss1.pd

arXiv.org e-Print Archive

Crossref