8,255 research outputs found

    Approximate Nearest Neighbor Search for Low Dimensional Queries

    Full text link
    We study the Approximate Nearest Neighbor problem for metric spaces where the query points are constrained to lie on a subspace of low doubling dimension, while the data is high-dimensional. We show that this problem can be solved efficiently despite the high dimensionality of the data.Comment: 25 page

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree

    Get PDF
    In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an nn-vertex graph G=(V,E,w)G=(V,E,w) with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set SVS \subseteq V, which changes over time. The changes applied to the terminal set are either terminal additions (incremental scenario), terminal removals (decremental scenario), or both (fully dynamic scenario). Our task here is twofold. We want to support updates in sublinear o(n)o(n) time, and keep the approximation factor of the algorithm as small as possible. We show that we can maintain a (6+ε)(6+\varepsilon)-approximate Steiner tree of a general graph in O~(nlogD)\tilde{O}(\sqrt{n} \log D) time per terminal addition or removal. Here, DD denotes the stretch of the metric induced by GG. For planar graphs we achieve the same running time and the approximation ratio of (2+ε)(2+\varepsilon). Moreover, we show faster algorithms for incremental and decremental scenarios. Finally, we show that if we allow higher approximation ratio, even more efficient algorithms are possible. In particular we show a polylogarithmic time (4+ε)(4+\varepsilon)-approximate algorithm for planar graphs. One of the main building blocks of our algorithms are dynamic distance oracles for vertex-labeled graphs, which are of independent interest. We also improve and use the online algorithms for the Steiner tree problem.Comment: Full version of the paper accepted to STOC'1

    Fast Construction of Nets in Low Dimensional Metrics, and Their Applications

    Full text link
    We present a near linear time algorithm for constructing hierarchical nets in finite metric spaces with constant doubling dimension. This data-structure is then applied to obtain improved algorithms for the following problems: Approximate nearest neighbor search, well-separated pair decomposition, compact representation scheme, doubling measure, and computation of the (approximate) Lipschitz constant of a function. In all cases, the running (preprocessing) time is near-linear and the space being used is linear.Comment: 41 pages. Extensive clean-up of minor English error

    Providing Diversity in K-Nearest Neighbor Query Results

    Full text link
    Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure

    Average Distance Queries through Weighted Samples in Graphs and Metric Spaces: High Scalability with Tight Statistical Guarantees

    Get PDF
    The average distance from a node to all other nodes in a graph, or from a query point in a metric space to a set of points, is a fundamental quantity in data analysis. The inverse of the average distance, known as the (classic) closeness centrality of a node, is a popular importance measure in the study of social networks. We develop novel structural insights on the sparsifiability of the distance relation via weighted sampling. Based on that, we present highly practical algorithms with strong statistical guarantees for fundamental problems. We show that the average distance (and hence the centrality) for all nodes in a graph can be estimated using O(ϵ2)O(\epsilon^{-2}) single-source distance computations. For a set VV of nn points in a metric space, we show that after preprocessing which uses O(n)O(n) distance computations we can compute a weighted sample SVS\subset V of size O(ϵ2)O(\epsilon^{-2}) such that the average distance from any query point vv to VV can be estimated from the distances from vv to SS. Finally, we show that for a set of points VV in a metric space, we can estimate the average pairwise distance using O(n+ϵ2)O(n+\epsilon^{-2}) distance computations. The estimate is based on a weighted sample of O(ϵ2)O(\epsilon^{-2}) pairs of points, which is computed using O(n)O(n) distance computations. Our estimates are unbiased with normalized mean square error (NRMSE) of at most ϵ\epsilon. Increasing the sample size by a O(logn)O(\log n) factor ensures that the probability that the relative error exceeds ϵ\epsilon is polynomially small.Comment: 21 pages, will appear in the Proceedings of RANDOM 201
    corecore