Search CORE

1,172 research outputs found

Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space

Author: Har-Peled Sariel
Kumar Nirman
Publication venue
Publication date: 01/12/2012
Field of study

For a set of

n

points in

\Re^d

, and parameters

k

and \eps, we present a data structure that answers (1+\eps,k)-\ANN queries in logarithmic time. Surprisingly, the space used by the data-structure is \Otilde (n /k); that is, the space used is sublinear in the input size if

k

is sufficiently large. Our approach provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out using this sketch. Using this, we provide a sublinear space data-structure that can estimate the density of a point set under various measures, including: \begin{inparaenum}[(i)] \item sum of distances of

k

closest points to the query point, and \item sum of squared distances of

k

closest points to the query point. \end{inparaenum} Our approach generalizes to other distance based estimation of densities of similar flavor. We also study the problem of approximating some of these quantities when using sampling. In particular, we show that a sample of size \Otilde (n /k) is sufficient, in some restricted cases, to estimate the above quantities. Remarkably, the sample size has only linear dependency on the dimension

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Robust Proximity Search for Balls using Sublinear Space

Author: Har-Peled Sariel
Kumar Nirman
Publication venue
Publication date: 01/01/2014
Field of study

Given a set of n disjoint balls b1, . . ., bn in IRd, we provide a data structure, of near linear size, that can answer (1 \pm \epsilon)-approximate kth-nearest neighbor queries in O(log n + 1/\epsilon^d) time, where k and \epsilon are provided at query time. If k and \epsilon are provided in advance, we provide a data structure to answer such queries, that requires (roughly) O(n/k) space; that is, the data structure has sublinear space requirement if k is sufficiently large

arXiv.org e-Print Archive

University of Memphis Digital Commons

Dagstuhl Research Online Publication Server

Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses

Author: Dillon Tharam S.
Feng Ling
Publication venue
Publication date: 01/01/2003
Field of study

A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses

CiteSeerX

University of Twente Research Information

Approximate Nearest Neighbor Search for Low Dimensional Queries

Author: Har-Peled Sariel
Kumar Nirman
Publication venue
Publication date: 01/01/2010
Field of study

We study the Approximate Nearest Neighbor problem for metric spaces where the query points are constrained to lie on a subspace of low doubling dimension, while the data is high-dimensional. We show that this problem can be solved efficiently despite the high dimensionality of the data.Comment: 25 page

arXiv.org e-Print Archive

University of Memphis Digital Commons

CiteSeerX

Crossref

Data Cube Approximation and Mining using Probabilistic Modeling

Author: Boujenoui Ameur
Goutte Cyril
Missaoui Rokia
Publication venue
Publication date: 01/01/2007
Field of study

On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

Indexability, concentration, and VC theory

Author: Pestov Vladimir
Publication venue: 'Elsevier BV'
Publication date: 21/05/2011
Field of study

Degrading performance of indexing schemes for exact similarity search in high dimensions has long since been linked to histograms of distributions of distances and other 1-Lipschitz functions getting concentrated. We discuss this observation in the framework of the phenomenon of concentration of measure on the structures of high dimension and the Vapnik-Chervonenkis theory of statistical learning.Comment: 17 pages, final submission to J. Discrete Algorithms (an expanded, improved and corrected version of the SISAP'2010 invited paper, this e-print, v3

arXiv.org e-Print Archive

Elsevier - Publisher Connector

An Adaptive Mechanism for Accurate Query Answering under Differential Privacy

Author: Li Chao
Miklau Gerome
Publication venue
Publication date: 01/01/2012
Field of study

We propose a novel mechanism for answering sets of count- ing queries under differential privacy. Given a workload of counting queries, the mechanism automatically selects a different set of "strategy" queries to answer privately, using those answers to derive answers to the workload. The main algorithm proposed in this paper approximates the optimal strategy for any workload of linear counting queries. With no cost to the privacy guarantee, the mechanism improves significantly on prior approaches and achieves near-optimal error for many workloads, when applied under (\epsilon, \delta)-differential privacy. The result is an adaptive mechanism which can help users achieve good utility without requiring that they reason carefully about the best formulation of their task.Comment: VLDB2012. arXiv admin note: substantial text overlap with arXiv:1103.136

arXiv.org e-Print Archive

CiteSeerX

ScholarWorks@UMass Amherst