291 research outputs found

    Hardness of Approximate Nearest Neighbor Search

    Full text link
    We prove conditional near-quadratic running time lower bounds for approximate Bichromatic Closest Pair with Euclidean, Manhattan, Hamming, or edit distance. Specifically, unless the Strong Exponential Time Hypothesis (SETH) is false, for every δ>0\delta>0 there exists a constant ϵ>0\epsilon>0 such that computing a (1+ϵ)(1+\epsilon)-approximation to the Bichromatic Closest Pair requires n2δn^{2-\delta} time. In particular, this implies a near-linear query time for Approximate Nearest Neighbor search with polynomial preprocessing time. Our reduction uses the Distributed PCP framework of [ARW'17], but obtains improved efficiency using Algebraic Geometry (AG) codes. Efficient PCPs from AG codes have been constructed in other settings before [BKKMS'16, BCGRS'17], but our construction is the first to yield new hardness results

    Distributed PCP Theorems for Hardness of Approximation in P

    Get PDF
    We present a new distributed model of probabilistically checkable proofs (PCP). A satisfying assignment x{0,1}nx \in \{0,1\}^n to a CNF formula φ\varphi is shared between two parties, where Alice knows x1,,xn/2x_1, \dots, x_{n/2}, Bob knows xn/2+1,,xnx_{n/2+1},\dots,x_n, and both parties know φ\varphi. The goal is to have Alice and Bob jointly write a PCP that xx satisfies φ\varphi, while exchanging little or no information. Unfortunately, this model as-is does not allow for nontrivial query complexity. Instead, we focus on a non-deterministic variant, where the players are helped by Merlin, a third party who knows all of xx. Using our framework, we obtain, for the first time, PCP-like reductions from the Strong Exponential Time Hypothesis (SETH) to approximation problems in P. In particular, under SETH we show that there are no truly-subquadratic approximation algorithms for Bichromatic Maximum Inner Product over {0,1}-vectors, Bichromatic LCS Closest Pair over permutations, Approximate Regular Expression Matching, and Diameter in Product Metric. All our inapproximability factors are nearly-tight. In particular, for the first two problems we obtain nearly-polynomial factors of 2(logn)1o(1)2^{(\log n)^{1-o(1)}}; only (1+o(1))(1+o(1))-factor lower bounds (under SETH) were known before

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

    Connectivity and equilibrium in random games

    Get PDF
    We study how the structure of the interaction graph of a game affects the existence of pure Nash equilibria. In particular, for a fixed interaction graph, we are interested in whether there are pure Nash equilibria arising when random utility tables are assigned to the players. We provide conditions for the structure of the graph under which equilibria are likely to exist and complementary conditions which make the existence of equilibria highly unlikely. Our results have immediate implications for many deterministic graphs and generalize known results for random games on the complete graph. In particular, our results imply that the probability that bounded degree graphs have pure Nash equilibria is exponentially small in the size of the graph and yield a simple algorithm that finds small nonexistence certificates for a large family of graphs. Then we show that in any strongly connected graph of n vertices with expansion (1+Ω(1))log2(n)(1+\Omega(1))\log_2(n) the distribution of the number of equilibria approaches the Poisson distribution with parameter 1, asymptotically as n+n \to +\infty.Comment: Published in at http://dx.doi.org/10.1214/10-AAP715 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On Generalization Bounds for Projective Clustering

    Full text link
    Given a set of points, clustering consists of finding a partition of a point set into kk clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous kk-median and kk-means objectives. One may also choose centers to be jj dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of nn samples PP drawn independently from some unknown, but fixed distribution D\mathcal{D}, how quickly does a solution computed on PP converge to the optimal clustering of D\mathcal{D}? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of O~(k/n)\tilde{O}\left(\sqrt{{k}/{n}}\right). This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for kk-means and extends it to other important objectives such as kk-median. For subspace clustering with jj-dimensional subspaces, we show a convergence rate of O~(kj2n)\tilde{O}\left(\sqrt{\frac{kj^2}{n}}\right). These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes kk-means, we show a convergence rate of Ω(kjn)\Omega\left(\sqrt{\frac{kj}{n}}\right) is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal

    Light Spanners for High Dimensional Norms via Stochastic Decompositions

    Get PDF
    Spanners for low dimensional spaces (e.g. Euclidean space of constant dimension, or doubling metrics) are well understood. This lies in contrast to the situation in high dimensional spaces, where except for the work of Har-Peled, Indyk and Sidiropoulos (SODA 2013), who showed that any n-point Euclidean metric has an O(t)-spanner with O~(n^{1+1/t^2}) edges, little is known. In this paper we study several aspects of spanners in high dimensional normed spaces. First, we build spanners for finite subsets of l_p with 1<p <=2. Second, our construction yields a spanner which is both sparse and also light, i.e., its total weight is not much larger than that of the minimum spanning tree. In particular, we show that any n-point subset of l_p for 1<p <=2 has an O(t)-spanner with n^{1+O~(1/t^p)} edges and lightness n^{O~(1/t^p)}. In fact, our results are more general, and they apply to any metric space admitting a certain low diameter stochastic decomposition. It is known that arbitrary metric spaces have an O(t)-spanner with lightness O(n^{1/t}). We exhibit the following tradeoff: metrics with decomposability parameter nu=nu(t) admit an O(t)-spanner with lightness O~(nu^{1/t}). For example, n-point Euclidean metrics have nu <=n^{1/t}, metrics with doubling constant lambda have nu <=lambda, and graphs of genus g have nu <=g. While these families do admit a (1+epsilon)-spanner, its lightness depend exponentially on the dimension (resp. log g). Our construction alleviates this exponential dependency, at the cost of incurring larger stretch
    corecore