34 research outputs found

    Probabilistic Polynomials and Hamming Nearest Neighbors

    Full text link
    We show how to compute any symmetric Boolean function on nn variables over any field (as well as the integers) with a probabilistic polynomial of degree O(nlog(1/ϵ))O(\sqrt{n \log(1/\epsilon)}) and error at most ϵ\epsilon. The degree dependence on nn and ϵ\epsilon is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution. This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n):NNc(n) : \mathbb{N} \rightarrow \mathbb{N}. Suppose we are given a database DD of nn vectors in {0,1}c(n)logn\{0,1\}^{c(n) \log n} and a collection of nn query vectors QQ in the same dimension. For all uQu \in Q, we wish to compute a vDv \in D with minimum Hamming distance from uu. We solve this problem in n21/O(c(n)log2c(n))n^{2-1/O(c(n) \log^2 c(n))} randomized time. Hence, the problem is in "truly subquadratic" time for O(logn)O(\log n) dimensions, and in subquadratic time for d=o((log2n)/(loglogn)2)d = o((\log^2 n)/(\log \log n)^2). We apply the algorithm to computing pairs with maximum inner product, closest pair in 1\ell_1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.Comment: 16 pages. To appear in 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015

    Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space

    Get PDF
    We study the rr-near neighbors reporting problem (rr-NN), i.e., reporting \emph{all} points in a high-dimensional point set SS that lie within a radius rr of a given query point qq. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving rr-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for rr-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201

    Fine-Grained Complexity Theory: Conditional Lower Bounds for Computational Geometry

    Get PDF
    Fine-grained complexity theory is the area of theoretical computer sciencethat proves conditional lower bounds based on the Strong Exponential TimeHypothesis and similar conjectures. This area has been thriving in the lastdecade, leading to conditionally best-possible algorithms for a wide variety ofproblems on graphs, strings, numbers etc. This article is an introduction tofine-grained lower bounds in computational geometry, with a focus on lowerbounds for polynomial-time problems based on the Orthogonal Vectors Hypothesis.Specifically, we discuss conditional lower bounds for nearest neighbor searchunder the Euclidean distance and Fr\'echet distance.<br

    Strong ETH Breaks With Merlin and Arthur: Short Non-Interactive Proofs of Batch Evaluation

    Get PDF
    We present an efficient proof system for Multipoint Arithmetic Circuit Evaluation: for every arithmetic circuit C(x1,,xn)C(x_1,\ldots,x_n) of size ss and degree dd over a field F{\mathbb F}, and any inputs a1,,aKFna_1,\ldots,a_K \in {\mathbb F}^n, \bullet the Prover sends the Verifier the values C(a1),,C(aK)FC(a_1), \ldots, C(a_K) \in {\mathbb F} and a proof of O~(Kd)\tilde{O}(K \cdot d) length, and \bullet the Verifier tosses poly(log(dKF/ε))\textrm{poly}(\log(dK|{\mathbb F}|/\varepsilon)) coins and can check the proof in about O~(K(n+d)+s)\tilde{O}(K \cdot(n + d) + s) time, with probability of error less than ε\varepsilon. For small degree dd, this "Merlin-Arthur" proof system (a.k.a. MA-proof system) runs in nearly-linear time, and has many applications. For example, we obtain MA-proof systems that run in cnc^{n} time (for various c<2c < 2) for the Permanent, #\#Circuit-SAT for all sublinear-depth circuits, counting Hamiltonian cycles, and infeasibility of 00-11 linear programs. In general, the value of any polynomial in Valiant's class VP{\sf VP} can be certified faster than "exhaustive summation" over all possible assignments. These results strongly refute a Merlin-Arthur Strong ETH and Arthur-Merlin Strong ETH posed by Russell Impagliazzo and others. We also give a three-round (AMA) proof system for quantified Boolean formulas running in 22n/3+o(n)2^{2n/3+o(n)} time, nearly-linear time MA-proof systems for counting orthogonal vectors in a collection and finding Closest Pairs in the Hamming metric, and a MA-proof system running in nk/2+O(1)n^{k/2+O(1)}-time for counting kk-cliques in graphs. We point to some potential future directions for refuting the Nondeterministic Strong ETH.Comment: 17 page

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before
    corecore