29 research outputs found
Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing
We prove a tight lower bound for the exponent for data-dependent
Locality-Sensitive Hashing schemes, recently used to design efficient solutions
for the -approximate nearest neighbor search. In particular, our lower bound
matches the bound of for the space,
obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15].
In recent years it emerged that data-dependent hashing is strictly superior
to the classical Locality-Sensitive Hashing, when the hash function is
data-independent. In the latter setting, the best exponent has been already
known: for the space, the tight bound is , with the upper
bound from [Indyk-Motwani, STOC'98] and the matching lower bound from
[O'Donnell-Wu-Zhou, ITCS'11].
We prove that, even if the hashing is data-dependent, it must hold that
. To prove the result, we need to formalize the
exact notion of data-dependent hashing that also captures the complexity of the
hash functions (in addition to their collision properties). Without restricting
such complexity, we would allow for obviously infeasible solutions such as the
Voronoi diagram of a dataset. To preclude such solutions, we require our hash
functions to be succinct. This condition is satisfied by all the known
algorithmic results.Comment: 16 pages, no figure
PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors
We present PUFFINN, a parameterless LSH-based index for solving the k-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorithm, we make heuristics rigorous. We perform experiments on real-world and synthetic inputs to evaluate implementation choices and show that the implementation satisfies the quality guarantees while being competitive with other state-of-the-art approaches to nearest neighbor search. We describe a novel synthetic data set that is difficult to solve for almost all existing nearest neighbor search approaches, and for which PUFFINN significantly outperform previous methods
Edges and switches, tunnels and bridges
Abstract. Edge casing is a well-known method to improve the readability of drawings of non-planar graphs. A cased drawing orders the edges of each edge crossing and interrupts the lower edge in an appropriate neighborhood of the crossing. Certain orders will lead to a more readable drawing than others. We formulate several optimization criteria that try to capture the concept of a "good" cased drawing. Further, we address the algorithmic question of how to turn a given drawing into an optimal cased drawing. For many of the resulting optimization problems, we either find polynomial time algorithms or NP-hardness results
Connecting the dots (with minimum crossings)
We study a prototype Crossing Minimization problem, defined as follows. Let F be an infinite family of (possibly vertex-labeled) graphs. Then, given a set P of (possibly labeled) n points in the Euclidean plane, a collection L subseteq Lines(P)={l: l is a line segment with both endpoints in P}, and a non-negative integer k, decide if there is a subcollection L'subseteq L such that the graph G=(P,L') is isomorphic to a graph in F and L' has at most k crossings. By G=(P,L'), we refer to the graph on vertex set P, where two vertices are adjacent if and only if there is a line segment that connects them in L'. Intuitively, in Crossing Minimization, we have a set of locations of interest, and we want to build/draw/exhibit connections between them (where L indicates where it is feasible to have these connections) so that we obtain a structure in F. Natural choices for F are the collections of perfect matchings, Hamiltonian paths, and graphs that contain an (s,t)-path (a path whose endpoints are labeled). While the objective of seeking a solution with few crossings is of interest from a theoretical point of view, it is also well motivated by a wide range of practical considerations. For example, links/roads (such as highways) may be cheaper to build and faster to traverse, and signals/moving objects would collide/interrupt each other less often. Further, graphs with fewer crossings are preferred for graphic user interfaces. As a starting point for a systematic study, we consider a special case of Crossing Minimization. Already for this case, we obtain NP-hardness and W[1]-hardness results, and ETH-based lower bounds. Specifically, suppose that the input also contains a collection D of d non-crossing line segments such that each point in P belongs to exactly one line in D, and L does not contain line segments between points on the same line in D. Clearly, Crossing Minimization is the case where d=n - then, P is in general position. The case of d=2 is of interest not only because it is the most restricted non-trivial case, but also since it corresponds to a class of graphs that has been well studied - specifically, it is Crossing Minimization where G=(P,L) is a (bipartite) graph with a so called two-layer drawing. For d=2, we consider three basic choices of F. For perfect matchings, we show (i) NP-hardness with an ETH-based lower bound, (ii) solvability in subexponential parameterized time, and (iii) existence of an O(k^2)-vertex kernel. Second, for Hamiltonian paths, we show (i) solvability in subexponential parameterized time, and (ii) existence of an O(k^2)-vertex kernel. Lastly, for graphs that contain an (s,t)-path, we show (i) NP-hardness and W[1]-hardness, and (ii) membership in XP
Breaching the 2-Approximation Barrier for Connectivity Augmentation: a Reduction to Steiner Tree
The basic goal of survivable network design is to build a cheap network that
maintains the connectivity between given sets of nodes despite the failure of a
few edges/nodes. The Connectivity Augmentation Problem (CAP) is arguably one of
the most basic problems in this area: given a (-edge)-connected graph
and a set of extra edges (links), select a minimum cardinality subset of
links such that adding to increases its edge connectivity to .
Intuitively, one wants to make an existing network more reliable by augmenting
it with extra edges. The best known approximation factor for this NP-hard
problem is , and this can be achieved with multiple approaches (the first
such result is in [Frederickson and J\'aj\'a'81]).
It is known [Dinitz et al.'76] that CAP can be reduced to the case ,
a.k.a. the Tree Augmentation Problem (TAP), for odd , and to the case ,
a.k.a. the Cactus Augmentation Problem (CacAP), for even . Several better
than approximation algorithms are known for TAP, culminating with a recent
approximation [Grandoni et al.'18]. However, for CacAP the best known
approximation is .
In this paper we breach the approximation barrier for CacAP, hence for
CAP, by presenting a polynomial-time
approximation. Previous approaches exploit properties of TAP that do not seem
to generalize to CacAP. We instead use a reduction to the Steiner tree problem
which was previously used in parameterized algorithms [Basavaraju et al.'14].
This reduction is not approximation preserving, and using the current best
approximation factor for Steiner tree [Byrka et al.'13] as a black-box would
not be good enough to improve on . To achieve the latter goal, we ``open the
box'' and exploit the specific properties of the instances of Steiner tree
arising from CacAP.Comment: Corrected a typo in the abstract (in metadata
Connecting the Dots (with Minimum Crossings)
We study a prototype Crossing Minimization problem, defined as follows. Let F be an infinite family of (possibly vertex-labeled) graphs. Then, given a set P of (possibly labeled) n points in the Euclidean plane, a collection L subseteq Lines(P)={l: l is a line segment with both endpoints in P}, and a non-negative integer k, decide if there is a subcollection L\u27subseteq L such that the graph G=(P,L\u27) is isomorphic to a graph in F and L\u27 has at most k crossings. By G=(P,L\u27), we refer to the graph on vertex set P, where two vertices are adjacent if and only if there is a line segment that connects them in L\u27. Intuitively, in Crossing Minimization, we have a set of locations of interest, and we want to build/draw/exhibit connections between them (where L indicates where it is feasible to have these connections) so that we obtain a structure in F. Natural choices for F are the collections of perfect matchings, Hamiltonian paths, and graphs that contain an (s,t)-path (a path whose endpoints are labeled). While the objective of seeking a solution with few crossings is of interest from a theoretical point of view, it is also well motivated by a wide range of practical considerations. For example, links/roads (such as highways) may be cheaper to build and faster to traverse, and signals/moving objects would collide/interrupt each other less often. Further, graphs with fewer crossings are preferred for graphic user interfaces.
As a starting point for a systematic study, we consider a special case of Crossing Minimization. Already for this case, we obtain NP-hardness and W[1]-hardness results, and ETH-based lower bounds. Specifically, suppose that the input also contains a collection D of d non-crossing line segments such that each point in P belongs to exactly one line in D, and L does not contain line segments between points on the same line in D. Clearly, Crossing Minimization is the case where d=n - then, P is in general position. The case of d=2 is of interest not only because it is the most restricted non-trivial case, but also since it corresponds to a class of graphs that has been well studied - specifically, it is Crossing Minimization where G=(P,L) is a (bipartite) graph with a so called two-layer drawing. For d=2, we consider three basic choices of F. For perfect matchings, we show (i) NP-hardness with an ETH-based lower bound, (ii) solvability in subexponential parameterized time, and (iii) existence of an O(k^2)-vertex kernel. Second, for Hamiltonian paths, we show (i) solvability in subexponential parameterized time, and (ii) existence of an O(k^2)-vertex kernel. Lastly, for graphs that contain an (s,t)-path, we show (i) NP-hardness and W[1]-hardness, and (ii) membership in XP
Algorithms for fat objects : decompositions and applications
Computational geometry is the branch of theoretical computer science that deals with algorithms and data structures for geometric objects. The most basic geometric objects include points, lines, polygons, and polyhedra. Computational geometry has applications in many areas of computer science, including computer graphics, robotics, and geographic information systems. In many computational-geometry problems, the theoretical worst case is achieved by input that is in some way "unrealistic". This causes situations where the theoretical running time is not a good predictor of the running time in practice. In addition, algorithms must also be designed with the worst-case examples in mind, which causes them to be needlessly complicated. In recent years, realistic input models have been proposed in an attempt to deal with this problem. The usual form such solutions take is to limit some geometric property of the input to a constant. We examine a specific realistic input model in this thesis: the model where objects are restricted to be fat. Intuitively, objects that are more like a ball are more fat, and objects that are more like a long pole are less fat. We look at fat objects in the context of five different problems—two related to decompositions of input objects and three problems suggested by computer graphics. Decompositions of geometric objects are important because they are often used as a preliminary step in other algorithms, since many algorithms can only handle geometric objects that are convex and preferably of low complexity. The two main issues in developing decomposition algorithms are to keep the number of pieces produced by the decomposition small and to compute the decomposition quickly. The main question we address is the following: is it possible to obtain better decompositions for fat objects than for general objects, and/or is it possible to obtain decompositions quickly? These questions are also interesting because most research into fat objects has concerned objects that are convex. We begin by triangulating fat polygons. The problem of triangulating polygons—that is, partitioning them into triangles without adding any vertices—has been solved already, but the only linear-time algorithm is so complicated that it has never been implemented. We propose two algorithms for triangulating fat polygons in linear time that are much simpler. They make use of the observation that a small set of guards placed at points inside a (certain type of) fat polygon is sufficient to see the boundary of such a polygon. We then look at decompositions of fat polyhedra in three dimensions. We show that polyhedra can be decomposed into a linear number of convex pieces if certain fatness restrictions aremet. We also show that if these restrictions are notmet, a quadratic number of pieces may be needed. We also show that if we wish the output to be fat and convex, the restrictions must be much tighter. We then study three computational-geometry problems inspired by computer graphics. First, we study ray-shooting amidst fat objects from two perspectives. This is the problem of preprocessing data into a data structure that can answer which object is first hit by a query ray in a given direction from a given point. We present a new data structure for answering vertical ray-shooting queries—that is, queries where the ray’s direction is fixed—as well as a data structure for answering ray-shooting queries for rays with arbitrary direction. Both structures improve the best known results on these problems. Another problem that is studied in the field of computer graphics is the depth-order problem. We study it in the context of computational geometry. This is the problem of finding an ordering of the objects in the scene from "top" to "bottom", where one object is above the other if they share a point in the projection to the xy-plane and the first object has a higher z-value at that point. We give an algorithm for finding the depth order of a group of fat objects and an algorithm for verifying if a depth order of a group of fat objects is correct. The latter algorithm is useful because the former can return an incorrect order if the objects do not have a depth order (this can happen if the above/below relationship has a cycle in it). The first algorithm improves on the results previously known for fat objects; the second is the first algorithm for verifying depth orders of fat objects. The final problem that we study is the hidden-surface removal problem. In this problem, we wish to find and report the visible portions of a scene from a given viewpoint—this is called the visibility map. The main difficulty in this problem is to find an algorithm whose running time depends in part on the complexity of the output. For example, if all but one of the objects in the input scene are hidden behind one large object, then our algorithm should have a faster running time than if all of the objects are visible and have borders that overlap. We give such an algorithm that improves on the running time of previous algorithms for fat objects. Furthermore, our algorithm is able to handle curved objects and situations where the objects do not have a depth order—two features missing from most other algorithms that perform hidden surface removal
Fast 2-Approximate All-Pairs Shortest Paths
In this paper, we revisit the classic approximate All-Pairs Shortest Paths
(APSP) problem in undirected graphs. For unweighted graphs, we provide an
algorithm for -approximate APSP in time,
for any . This is time, using known bounds for
rectangular matrix multiplication~~[Le Gall, Urrutia, SODA
2018]. Our result improves on the bound of [Roddity, STOC
2023], and on the bound of [Baswana, Kavitha, SICOMP
2010] for graphs with edges.
For weighted graphs, we obtain -approximate APSP in time, for any . This is
time using known bounds for . It improves on the state of the art
bound of by [Kavitha, Algorithmica 2012]. Our techniques further
lead to improved bounds in a wide range of density for weighted graphs. In
particular, for the sparse regime we construct a distance oracle in time that supports -approximate queries in constant time. For
sparse graphs, the preprocessing time of the algorithm matches conditional
lower bounds [Patrascu, Roditty, Thorup, FOCS 2012; Abboud, Bringmann, Fischer,
STOC 2023]. To the best of our knowledge, this is the first 2-approximate
distance oracle that has subquadratic preprocessing time in sparse graphs.
We also obtain new bounds in the near additive regime for unweighted graphs.
We give faster algorithms for -approximate APSP, for
.
We obtain these results by incorporating fast rectangular matrix
multiplications into various combinatorial algorithms that carefully balance
out distance computation on layers of sparse graphs preserving certain distance
information
Efficient Data Structures for Text Processing Applications
This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models