29 research outputs found

    Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

    Get PDF
    We prove a tight lower bound for the exponent ρ\rho for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the cc-approximate nearest neighbor search. In particular, our lower bound matches the bound of ρ12c1+o(1)\rho\le \frac{1}{2c-1}+o(1) for the 1\ell_1 space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the 1\ell_1 space, the tight bound is ρ=1/c\rho=1/c, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that ρ12c1o(1)\rho\ge \frac{1}{2c-1}-o(1). To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.Comment: 16 pages, no figure

    PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

    Get PDF
    We present PUFFINN, a parameterless LSH-based index for solving the k-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorithm, we make heuristics rigorous. We perform experiments on real-world and synthetic inputs to evaluate implementation choices and show that the implementation satisfies the quality guarantees while being competitive with other state-of-the-art approaches to nearest neighbor search. We describe a novel synthetic data set that is difficult to solve for almost all existing nearest neighbor search approaches, and for which PUFFINN significantly outperform previous methods

    Edges and switches, tunnels and bridges

    Get PDF
    Abstract. Edge casing is a well-known method to improve the readability of drawings of non-planar graphs. A cased drawing orders the edges of each edge crossing and interrupts the lower edge in an appropriate neighborhood of the crossing. Certain orders will lead to a more readable drawing than others. We formulate several optimization criteria that try to capture the concept of a "good" cased drawing. Further, we address the algorithmic question of how to turn a given drawing into an optimal cased drawing. For many of the resulting optimization problems, we either find polynomial time algorithms or NP-hardness results

    Connecting the dots (with minimum crossings)

    Get PDF
    We study a prototype Crossing Minimization problem, defined as follows. Let F be an infinite family of (possibly vertex-labeled) graphs. Then, given a set P of (possibly labeled) n points in the Euclidean plane, a collection L subseteq Lines(P)={l: l is a line segment with both endpoints in P}, and a non-negative integer k, decide if there is a subcollection L'subseteq L such that the graph G=(P,L') is isomorphic to a graph in F and L' has at most k crossings. By G=(P,L'), we refer to the graph on vertex set P, where two vertices are adjacent if and only if there is a line segment that connects them in L'. Intuitively, in Crossing Minimization, we have a set of locations of interest, and we want to build/draw/exhibit connections between them (where L indicates where it is feasible to have these connections) so that we obtain a structure in F. Natural choices for F are the collections of perfect matchings, Hamiltonian paths, and graphs that contain an (s,t)-path (a path whose endpoints are labeled). While the objective of seeking a solution with few crossings is of interest from a theoretical point of view, it is also well motivated by a wide range of practical considerations. For example, links/roads (such as highways) may be cheaper to build and faster to traverse, and signals/moving objects would collide/interrupt each other less often. Further, graphs with fewer crossings are preferred for graphic user interfaces. As a starting point for a systematic study, we consider a special case of Crossing Minimization. Already for this case, we obtain NP-hardness and W[1]-hardness results, and ETH-based lower bounds. Specifically, suppose that the input also contains a collection D of d non-crossing line segments such that each point in P belongs to exactly one line in D, and L does not contain line segments between points on the same line in D. Clearly, Crossing Minimization is the case where d=n - then, P is in general position. The case of d=2 is of interest not only because it is the most restricted non-trivial case, but also since it corresponds to a class of graphs that has been well studied - specifically, it is Crossing Minimization where G=(P,L) is a (bipartite) graph with a so called two-layer drawing. For d=2, we consider three basic choices of F. For perfect matchings, we show (i) NP-hardness with an ETH-based lower bound, (ii) solvability in subexponential parameterized time, and (iii) existence of an O(k^2)-vertex kernel. Second, for Hamiltonian paths, we show (i) solvability in subexponential parameterized time, and (ii) existence of an O(k^2)-vertex kernel. Lastly, for graphs that contain an (s,t)-path, we show (i) NP-hardness and W[1]-hardness, and (ii) membership in XP

    Breaching the 2-Approximation Barrier for Connectivity Augmentation: a Reduction to Steiner Tree

    Full text link
    The basic goal of survivable network design is to build a cheap network that maintains the connectivity between given sets of nodes despite the failure of a few edges/nodes. The Connectivity Augmentation Problem (CAP) is arguably one of the most basic problems in this area: given a kk(-edge)-connected graph GG and a set of extra edges (links), select a minimum cardinality subset AA of links such that adding AA to GG increases its edge connectivity to k+1k+1. Intuitively, one wants to make an existing network more reliable by augmenting it with extra edges. The best known approximation factor for this NP-hard problem is 22, and this can be achieved with multiple approaches (the first such result is in [Frederickson and J\'aj\'a'81]). It is known [Dinitz et al.'76] that CAP can be reduced to the case k=1k=1, a.k.a. the Tree Augmentation Problem (TAP), for odd kk, and to the case k=2k=2, a.k.a. the Cactus Augmentation Problem (CacAP), for even kk. Several better than 22 approximation algorithms are known for TAP, culminating with a recent 1.4581.458 approximation [Grandoni et al.'18]. However, for CacAP the best known approximation is 22. In this paper we breach the 22 approximation barrier for CacAP, hence for CAP, by presenting a polynomial-time 2ln(4)9671120+ϵ<1.912\ln(4)-\frac{967}{1120}+\epsilon<1.91 approximation. Previous approaches exploit properties of TAP that do not seem to generalize to CacAP. We instead use a reduction to the Steiner tree problem which was previously used in parameterized algorithms [Basavaraju et al.'14]. This reduction is not approximation preserving, and using the current best approximation factor for Steiner tree [Byrka et al.'13] as a black-box would not be good enough to improve on 22. To achieve the latter goal, we ``open the box'' and exploit the specific properties of the instances of Steiner tree arising from CacAP.Comment: Corrected a typo in the abstract (in metadata

    Connecting the Dots (with Minimum Crossings)

    Get PDF
    We study a prototype Crossing Minimization problem, defined as follows. Let F be an infinite family of (possibly vertex-labeled) graphs. Then, given a set P of (possibly labeled) n points in the Euclidean plane, a collection L subseteq Lines(P)={l: l is a line segment with both endpoints in P}, and a non-negative integer k, decide if there is a subcollection L\u27subseteq L such that the graph G=(P,L\u27) is isomorphic to a graph in F and L\u27 has at most k crossings. By G=(P,L\u27), we refer to the graph on vertex set P, where two vertices are adjacent if and only if there is a line segment that connects them in L\u27. Intuitively, in Crossing Minimization, we have a set of locations of interest, and we want to build/draw/exhibit connections between them (where L indicates where it is feasible to have these connections) so that we obtain a structure in F. Natural choices for F are the collections of perfect matchings, Hamiltonian paths, and graphs that contain an (s,t)-path (a path whose endpoints are labeled). While the objective of seeking a solution with few crossings is of interest from a theoretical point of view, it is also well motivated by a wide range of practical considerations. For example, links/roads (such as highways) may be cheaper to build and faster to traverse, and signals/moving objects would collide/interrupt each other less often. Further, graphs with fewer crossings are preferred for graphic user interfaces. As a starting point for a systematic study, we consider a special case of Crossing Minimization. Already for this case, we obtain NP-hardness and W[1]-hardness results, and ETH-based lower bounds. Specifically, suppose that the input also contains a collection D of d non-crossing line segments such that each point in P belongs to exactly one line in D, and L does not contain line segments between points on the same line in D. Clearly, Crossing Minimization is the case where d=n - then, P is in general position. The case of d=2 is of interest not only because it is the most restricted non-trivial case, but also since it corresponds to a class of graphs that has been well studied - specifically, it is Crossing Minimization where G=(P,L) is a (bipartite) graph with a so called two-layer drawing. For d=2, we consider three basic choices of F. For perfect matchings, we show (i) NP-hardness with an ETH-based lower bound, (ii) solvability in subexponential parameterized time, and (iii) existence of an O(k^2)-vertex kernel. Second, for Hamiltonian paths, we show (i) solvability in subexponential parameterized time, and (ii) existence of an O(k^2)-vertex kernel. Lastly, for graphs that contain an (s,t)-path, we show (i) NP-hardness and W[1]-hardness, and (ii) membership in XP

    Algorithms for fat objects : decompositions and applications

    Get PDF
    Computational geometry is the branch of theoretical computer science that deals with algorithms and data structures for geometric objects. The most basic geometric objects include points, lines, polygons, and polyhedra. Computational geometry has applications in many areas of computer science, including computer graphics, robotics, and geographic information systems. In many computational-geometry problems, the theoretical worst case is achieved by input that is in some way "unrealistic". This causes situations where the theoretical running time is not a good predictor of the running time in practice. In addition, algorithms must also be designed with the worst-case examples in mind, which causes them to be needlessly complicated. In recent years, realistic input models have been proposed in an attempt to deal with this problem. The usual form such solutions take is to limit some geometric property of the input to a constant. We examine a specific realistic input model in this thesis: the model where objects are restricted to be fat. Intuitively, objects that are more like a ball are more fat, and objects that are more like a long pole are less fat. We look at fat objects in the context of five different problems—two related to decompositions of input objects and three problems suggested by computer graphics. Decompositions of geometric objects are important because they are often used as a preliminary step in other algorithms, since many algorithms can only handle geometric objects that are convex and preferably of low complexity. The two main issues in developing decomposition algorithms are to keep the number of pieces produced by the decomposition small and to compute the decomposition quickly. The main question we address is the following: is it possible to obtain better decompositions for fat objects than for general objects, and/or is it possible to obtain decompositions quickly? These questions are also interesting because most research into fat objects has concerned objects that are convex. We begin by triangulating fat polygons. The problem of triangulating polygons—that is, partitioning them into triangles without adding any vertices—has been solved already, but the only linear-time algorithm is so complicated that it has never been implemented. We propose two algorithms for triangulating fat polygons in linear time that are much simpler. They make use of the observation that a small set of guards placed at points inside a (certain type of) fat polygon is sufficient to see the boundary of such a polygon. We then look at decompositions of fat polyhedra in three dimensions. We show that polyhedra can be decomposed into a linear number of convex pieces if certain fatness restrictions aremet. We also show that if these restrictions are notmet, a quadratic number of pieces may be needed. We also show that if we wish the output to be fat and convex, the restrictions must be much tighter. We then study three computational-geometry problems inspired by computer graphics. First, we study ray-shooting amidst fat objects from two perspectives. This is the problem of preprocessing data into a data structure that can answer which object is first hit by a query ray in a given direction from a given point. We present a new data structure for answering vertical ray-shooting queries—that is, queries where the ray’s direction is fixed—as well as a data structure for answering ray-shooting queries for rays with arbitrary direction. Both structures improve the best known results on these problems. Another problem that is studied in the field of computer graphics is the depth-order problem. We study it in the context of computational geometry. This is the problem of finding an ordering of the objects in the scene from "top" to "bottom", where one object is above the other if they share a point in the projection to the xy-plane and the first object has a higher z-value at that point. We give an algorithm for finding the depth order of a group of fat objects and an algorithm for verifying if a depth order of a group of fat objects is correct. The latter algorithm is useful because the former can return an incorrect order if the objects do not have a depth order (this can happen if the above/below relationship has a cycle in it). The first algorithm improves on the results previously known for fat objects; the second is the first algorithm for verifying depth orders of fat objects. The final problem that we study is the hidden-surface removal problem. In this problem, we wish to find and report the visible portions of a scene from a given viewpoint—this is called the visibility map. The main difficulty in this problem is to find an algorithm whose running time depends in part on the complexity of the output. For example, if all but one of the objects in the input scene are hidden behind one large object, then our algorithm should have a faster running time than if all of the objects are visible and have borders that overlap. We give such an algorithm that improves on the running time of previous algorithms for fat objects. Furthermore, our algorithm is able to handle curved objects and situations where the objects do not have a depth order—two features missing from most other algorithms that perform hidden surface removal

    Fast 2-Approximate All-Pairs Shortest Paths

    Full text link
    In this paper, we revisit the classic approximate All-Pairs Shortest Paths (APSP) problem in undirected graphs. For unweighted graphs, we provide an algorithm for 22-approximate APSP in O~(n2.5r+nω(r))\tilde O(n^{2.5-r}+n^{\omega(r)}) time, for any r[0,1]r\in[0,1]. This is O(n2.032)O(n^{2.032}) time, using known bounds for rectangular matrix multiplication~nω(r)n^{\omega(r)}~[Le Gall, Urrutia, SODA 2018]. Our result improves on the O~(n2.25)\tilde{O}(n^{2.25}) bound of [Roddity, STOC 2023], and on the O~(mn+n2)\tilde{O}(m\sqrt n+n^2) bound of [Baswana, Kavitha, SICOMP 2010] for graphs with mn1.532m\geq n^{1.532} edges. For weighted graphs, we obtain (2+ϵ)(2+\epsilon)-approximate APSP in O~(n3r+nω(r))\tilde O(n^{3-r}+n^{\omega(r)}) time, for any r[0,1]r\in [0,1]. This is O(n2.214)O(n^{2.214}) time using known bounds for ω(r)\omega(r). It improves on the state of the art bound of O(n2.25)O(n^{2.25}) by [Kavitha, Algorithmica 2012]. Our techniques further lead to improved bounds in a wide range of density for weighted graphs. In particular, for the sparse regime we construct a distance oracle in O~(mn2/3)\tilde O(mn^{2/3}) time that supports 22-approximate queries in constant time. For sparse graphs, the preprocessing time of the algorithm matches conditional lower bounds [Patrascu, Roditty, Thorup, FOCS 2012; Abboud, Bringmann, Fischer, STOC 2023]. To the best of our knowledge, this is the first 2-approximate distance oracle that has subquadratic preprocessing time in sparse graphs. We also obtain new bounds in the near additive regime for unweighted graphs. We give faster algorithms for (1+ϵ,k)(1+\epsilon,k)-approximate APSP, for k=2,4,6,8k=2,4,6,8. We obtain these results by incorporating fast rectangular matrix multiplications into various combinatorial algorithms that carefully balance out distance computation on layers of sparse graphs preserving certain distance information

    Efficient Data Structures for Text Processing Applications

    Get PDF
    This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

    Eight Biennial Report : April 2005 – March 2007

    No full text
    corecore