8,255 research outputs found
Approximate Nearest Neighbor Search for Low Dimensional Queries
We study the Approximate Nearest Neighbor problem for metric spaces where the
query points are constrained to lie on a subspace of low doubling dimension,
while the data is high-dimensional. We show that this problem can be solved
efficiently despite the high dimensionality of the data.Comment: 25 page
Distributed PCP Theorems for Hardness of Approximation in P
We present a new distributed model of probabilistically checkable proofs
(PCP). A satisfying assignment to a CNF formula is
shared between two parties, where Alice knows , Bob knows
, and both parties know . The goal is to have
Alice and Bob jointly write a PCP that satisfies , while
exchanging little or no information. Unfortunately, this model as-is does not
allow for nontrivial query complexity. Instead, we focus on a non-deterministic
variant, where the players are helped by Merlin, a third party who knows all of
.
Using our framework, we obtain, for the first time, PCP-like reductions from
the Strong Exponential Time Hypothesis (SETH) to approximation problems in P.
In particular, under SETH we show that there are no truly-subquadratic
approximation algorithms for Bichromatic Maximum Inner Product over
{0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate
Regular Expression Matching, and Diameter in Product Metric. All our
inapproximability factors are nearly-tight. In particular, for the first two
problems we obtain nearly-polynomial factors of ; only
-factor lower bounds (under SETH) were known before
The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree
In this paper we study the Steiner tree problem over a dynamic set of
terminals. We consider the model where we are given an -vertex graph
with positive real edge weights, and our goal is to maintain a tree
which is a good approximation of the minimum Steiner tree spanning a terminal
set , which changes over time. The changes applied to the
terminal set are either terminal additions (incremental scenario), terminal
removals (decremental scenario), or both (fully dynamic scenario). Our task
here is twofold. We want to support updates in sublinear time, and keep
the approximation factor of the algorithm as small as possible. We show that we
can maintain a -approximate Steiner tree of a general graph in
time per terminal addition or removal. Here,
denotes the stretch of the metric induced by . For planar graphs we achieve
the same running time and the approximation ratio of .
Moreover, we show faster algorithms for incremental and decremental scenarios.
Finally, we show that if we allow higher approximation ratio, even more
efficient algorithms are possible. In particular we show a polylogarithmic time
-approximate algorithm for planar graphs.
One of the main building blocks of our algorithms are dynamic distance
oracles for vertex-labeled graphs, which are of independent interest. We also
improve and use the online algorithms for the Steiner tree problem.Comment: Full version of the paper accepted to STOC'1
Fast Construction of Nets in Low Dimensional Metrics, and Their Applications
We present a near linear time algorithm for constructing hierarchical nets in
finite metric spaces with constant doubling dimension. This data-structure is
then applied to obtain improved algorithms for the following problems:
Approximate nearest neighbor search, well-separated pair decomposition, compact
representation scheme, doubling measure, and computation of the (approximate)
Lipschitz constant of a function. In all cases, the running (preprocessing)
time is near-linear and the space being used is linear.Comment: 41 pages. Extensive clean-up of minor English error
Providing Diversity in K-Nearest Neighbor Query Results
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)
queries return the K closest answers according to given distance metric in the
database with respect to Q. In this scenario, it is possible that a majority of
the answers may be very similar to some other, especially when the data has
clusters. For a variety of applications, such homogeneous result sets may not
add value to the user. In this paper, we consider the problem of providing
diversity in the results of KNN queries, that is, to produce the closest result
set such that each answer is sufficiently different from the rest. We first
propose a user-tunable definition of diversity, and then present an algorithm,
called MOTLEY, for producing a diverse result set as per this definition.
Through a detailed experimental evaluation on real and synthetic data, we show
that MOTLEY can produce diverse result sets by reading only a small fraction of
the tuples in the database. Further, it imposes no additional overhead on the
evaluation of traditional KNN queries, thereby providing a seamless interface
between diversity and distance.Comment: 20 pages, 11 figure
Average Distance Queries through Weighted Samples in Graphs and Metric Spaces: High Scalability with Tight Statistical Guarantees
The average distance from a node to all other nodes in a graph, or from a
query point in a metric space to a set of points, is a fundamental quantity in
data analysis. The inverse of the average distance, known as the (classic)
closeness centrality of a node, is a popular importance measure in the study of
social networks. We develop novel structural insights on the sparsifiability of
the distance relation via weighted sampling. Based on that, we present highly
practical algorithms with strong statistical guarantees for fundamental
problems. We show that the average distance (and hence the centrality) for all
nodes in a graph can be estimated using single-source
distance computations. For a set of points in a metric space, we show
that after preprocessing which uses distance computations we can compute
a weighted sample of size such that the average
distance from any query point to can be estimated from the distances
from to . Finally, we show that for a set of points in a metric
space, we can estimate the average pairwise distance using
distance computations. The estimate is based on a weighted sample of
pairs of points, which is computed using distance
computations. Our estimates are unbiased with normalized mean square error
(NRMSE) of at most . Increasing the sample size by a
factor ensures that the probability that the relative error exceeds
is polynomially small.Comment: 21 pages, will appear in the Proceedings of RANDOM 201
- …