257 research outputs found

    Approximate range searching☆☆A preliminary version of this paper appeared in the Proc. of the 11th Annual ACM Symp. on Computational Geometry, 1995, pp. 172–181.

    Get PDF
    AbstractThe range searching problem is a fundamental problem in computational geometry, with numerous important applications. Most research has focused on solving this problem exactly, but lower bounds show that if linear space is assumed, the problem cannot be solved in polylogarithmic time, except for the case of orthogonal ranges. In this paper we show that if one is willing to allow approximate ranges, then it is possible to do much better. In particular, given a bounded range Q of diameter w and ε>0, an approximate range query treats the range as a fuzzy object, meaning that points lying within distance εw of the boundary of Q either may or may not be counted. We show that in any fixed dimension d, a set of n points in Rd can be preprocessed in O(n+logn) time and O(n) space, such that approximate queries can be answered in O(logn(1/ε)d) time. The only assumption we make about ranges is that the intersection of a range and a d-dimensional cube can be answered in constant time (depending on dimension). For convex ranges, we tighten this to O(logn+(1/ε)d−1) time. We also present a lower bound for approximate range searching based on partition trees of Ω(logn+(1/ε)d−1), which implies optimality for convex ranges (assuming fixed dimensions). Finally, we give empirical evidence showing that allowing small relative errors can significantly improve query execution times

    2-Dimensional String Problems: Data Structures and Quantum Algorithms

    Get PDF
    The field of stringology studies algorithms and data structures used for processing strings efficiently. The goal of this thesis is to investigate 2-dimensional (2D) variants of some fundamental string problems, including \textit{Exact Pattern Matching} and \textit{Longest Common Substring}. In the 2D pattern matching problem, we are given a matrix \M[1\dd n,1\dd n] that consists of N=n×nN = n \times n symbols drawn from an alphabet Σ\Sigma of size σ\sigma. The query consists of a m×m m \times m square matrix \PP[1\dd m, 1\dd m] drawn from the same alphabet, and the task is to find all the locations of \PP in \M. For such square patterns, data structures such as suffix trees and suffix arrays exist for the task of efficient pattern matching. However, a suffix tree occupies O(NlogN)O(N \log N) bits, which is significantly more than that of the original text\u27s size of NlogσN\log \sigma bits. Therefore, the design of compressed data structures, that supports pattern matching queries efficiently and occupies space close to the original text\u27s size, is imperative. In this thesis, we show an interesting result by designing a compact text index of size O(NloglogN+Nlogσ)O(N \log\log N + N \log\sigma) bits that at least supports efficient inverse suffix array queries. Although, the question of designing a compressed text index that would lead to efficient pattern matching is still evasive, this index gives a hope on the existence of a full 2D compressed text index with all functionalities similar to that of 1D case. On the other hand, the Longest Common 2D substring problem consists of two 2D strings (matrices), and the task is to report the size of the longest common 2D substring (submatrix) of these 2D strings. It is interesting to know if there exists a sublinear-time algorithm for solving this task. We answer this question positively by presenting a sublinear-time \textit{quantum} algorithm. In addition to this, we prove that any quantum algorithm requires at least Ω~(N2/3)\tilde{\Omega}(N^{2/3}) time to solve this problem

    Elastic-Degenerate String Matching with 1 Error

    Get PDF
    An elastic-degenerate string is a sequence of nn finite sets of strings of total length NN, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length mm in an ED text. This problem has recently received some attention by the combinatorial pattern matching community, culminating in an O~(nmω1)+O(N)\tilde{\mathcal{O}}(nm^{\omega-1})+\mathcal{O}(N)-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where ω\omega denotes the matrix multiplication exponent and the O~()\tilde{\mathcal{O}}(\cdot) notation suppresses polylog factors. In the kk-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most kk errors. kk-EDSM can be solved in O(k2mG+kN)\mathcal{O}(k^2mG+kN) time, under edit distance, or O(kmG+kN)\mathcal{O}(kmG+kN) time, under Hamming distance, where GG denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately, GG is only bounded by NN, and so even for k=1k=1, the existing algorithms run in Ω(mN)\Omega(mN) time in the worst case. In this paper we show that 11-EDSM can be solved in O((nm2+N)logm)\mathcal{O}((nm^2 + N)\log m) or O(nm3+N)\mathcal{O}(nm^3 + N) time under edit distance. For the decision version, we present a faster O(nm2logm+Nloglogm)\mathcal{O}(nm^2\sqrt{\log m} + N\log\log m)-time algorithm. We also show that 11-EDSM can be solved in O(nm2+Nlogm)\mathcal{O}(nm^2 + N\log m) time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from 11-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the kk-errata trees for indexing with errors [Cole et al., STOC 2004].Comment: This is an extended version of a paper accepted at LATIN 202

    Stabbing Planes

    Get PDF
    We introduce and develop a new semi-algebraic proof system, called Stabbing Planes that is in the style of DPLL-based modern SAT solvers. As with DPLL, there is only one rule: the current polytope can be subdivided by branching on an inequality and its "integer negation." That is, we can (nondeterministically choose) a hyperplane a x >= b with integer coefficients, which partitions the polytope into three pieces: the points in the polytope satisfying a x >= b, the points satisfying a x <= b-1, and the middle slab b-1 < a x < b. Since the middle slab contains no integer points it can be safely discarded, and the algorithm proceeds recursively on the other two branches. Each path terminates when the current polytope is empty, which is polynomial-time checkable. Among our results, we show somewhat surprisingly that Stabbing Planes can efficiently simulate Cutting Planes, and moreover, is strictly stronger than Cutting Planes under a reasonable conjecture. We prove linear lower bounds on the rank of Stabbing Planes refutations, by adapting a lifting argument in communication complexity

    Efficient Data Structures for Text Processing Applications

    Get PDF
    This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF

    Hierarchical Categories in Colored Searching

    Get PDF
    In colored range counting (CRC), the input is a set of points where each point is assigned a "color" (or a "category") and the goal is to store them in a data structure such that the number of distinct categories inside a given query range can be counted efficiently. CRC has strong motivations as it allows data structure to deal with categorical data. However, colors (i.e., the categories) in the CRC problem do not have any internal structure, whereas this is not the case for many datasets in practice where hierarchical categories exists or where a single input belongs to multiple categories. Motivated by these, we consider variants of the problem where such structures can be represented. We define two variants of the problem called hierarchical range counting (HCC) and sub-category colored range counting (SCRC) and consider hierarchical structures that can either be a DAG or a tree. We show that the two problems on some special trees are in fact equivalent to other well-known problems in the literature. Based on these, we also give efficient data structures when the underlying hierarchy can be represented as a tree. We show a conditional lower bound for the general case when the existing hierarchy can be any DAG, through reductions from the orthogonal vectors problem

    Tight Bounds on the Maximum Number of Shortest Unique Substrings

    Get PDF
    A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S

    On the hausdorff and other cluster Voronoi diagrams

    Get PDF
    The Voronoi diagram is a fundamental geometric structure that encodes proximity information. Given a set of geometric objects, called sites, their Voronoi diagram is a subdivision of the underlying space into maximal regions, such that all points within one region have the same nearest site. Problems in diverse application domains (such as VLSI CAD, robotics, facility location, etc.) demand various generalizations of this simple concept. While many generalized Voronoi diagrams have been well studied, many others still have unsettled questions. An example of the latter are cluster Voronoi diagrams, whose sites are sets (clusters) of objects rather than individual objects. In this dissertation we study certain cluster Voronoi diagrams from the perspective of their construction algorithms and algorithmic applications. Our main focus is the Hausdorff Voronoi diagram; we also study the farthest-segment Voronoi diagram, as well as certain special cases of the farthest-color Voronoi diagram. We establish a connection between cluster Voronoi diagrams and the stabbing circle problem for segments in the plane. Our results are as follows. (1) We investigate the randomized incremental construction of the Hausdorff Voronoi diagram. We consider separately the case of non-crossing clusters, when the combinatorial complexity of the diagram is O(n) where n is the total number of points in all clusters. For this case, we present two construction algorithms that require O(n log2 n) expected time. For the general case of arbitrary clusters, we present an algorithm that requires O((m + n log n) log n) expected time and O(m + n log n) expected space, where m is a parameter reflecting the number of crossings between clusters' convex hulls. (2) We present an O(n) time algorithm to construct the farthest-segment Voronoi diagram of n segments, after the sequence of its faces at infinity is known. This augments the well-known linear-time framework for Voronoi diagram of points in convex position, with the ability to handle disconnected Voronoi regions. (3) We establish a connection between the cluster Voronoi diagrams (the Hausdorff and the farthest-color Voronoi diagram) and the stabbing circle problem. This implies a new method to solve the latter problem. Our method results in a near-optimal O(n log2 n) time algorithm for a set of n parallel segments, and in an optimal O(n log n) time algorithm for a set of n segments satisfying some other special conditions. (4) We study the farthest-color Voronoi diagram in special cases considered by the stabbing circle problem. We prove O(n) bound for its combinatorial complexity and present an O(nlogn) time algorithm to construct it
    corecore