Search CORE

257 research outputs found

Approximate range searching☆☆A preliminary version of this paper appeared in the Proc. of the 11th Annual ACM Symp. on Computational Geometry, 1995, pp. 172–181.

Author: Arya Sunil
Mount David M.
Publication venue: Elsevier Science B.V.
Publication date: 31/12/2000
Field of study

AbstractThe range searching problem is a fundamental problem in computational geometry, with numerous important applications. Most research has focused on solving this problem exactly, but lower bounds show that if linear space is assumed, the problem cannot be solved in polylogarithmic time, except for the case of orthogonal ranges. In this paper we show that if one is willing to allow approximate ranges, then it is possible to do much better. In particular, given a bounded range Q of diameter w and ε>0, an approximate range query treats the range as a fuzzy object, meaning that points lying within distance εw of the boundary of Q either may or may not be counted. We show that in any fixed dimension d, a set of n points in Rd can be preprocessed in O(n+logn) time and O(n) space, such that approximate queries can be answered in O(logn(1/ε)d) time. The only assumption we make about ranges is that the intersection of a range and a d-dimensional cube can be answered in constant time (depending on dimension). For convex ranges, we tighten this to O(logn+(1/ε)d−1) time. We also present a lower bound for approximate range searching based on partition trees of Ω(logn+(1/ε)d−1), which implies optimality for convex ranges (assuming fixed dimensions). Finally, we give empirical evidence showing that allowing small relative errors can significantly improve query execution times

Elsevier - Publisher Connector

2-Dimensional String Problems: Data Structures and Quantum Algorithms

Author: Patel Dhrumilkumar
Publication venue: LSU Digital Commons
Publication date: 26/07/2022
Field of study

The field of stringology studies algorithms and data structures used for processing strings efficiently. The goal of this thesis is to investigate 2-dimensional (2D) variants of some fundamental string problems, including \textit{Exact Pattern Matching} and \textit{Longest Common Substring}. In the 2D pattern matching problem, we are given a matrix \M[1\dd n,1\dd n] that consists of

N = n \times n

symbols drawn from an alphabet

\Sigma

of size

\sigma

. The query consists of a

m \times m

square matrix \PP[1\dd m, 1\dd m] drawn from the same alphabet, and the task is to find all the locations of \PP in \M. For such square patterns, data structures such as suffix trees and suffix arrays exist for the task of efficient pattern matching. However, a suffix tree occupies

O(N \log N)

bits, which is significantly more than that of the original text\u27s size of

N\log \sigma

bits. Therefore, the design of compressed data structures, that supports pattern matching queries efficiently and occupies space close to the original text\u27s size, is imperative. In this thesis, we show an interesting result by designing a compact text index of size

O(N \log\log N + N \log\sigma)

bits that at least supports efficient inverse suffix array queries. Although, the question of designing a compressed text index that would lead to efficient pattern matching is still evasive, this index gives a hope on the existence of a full 2D compressed text index with all functionalities similar to that of 1D case. On the other hand, the Longest Common 2D substring problem consists of two 2D strings (matrices), and the task is to report the size of the longest common 2D substring (submatrix) of these 2D strings. It is interesting to know if there exists a sublinear-time algorithm for solving this task. We answer this question positively by presenting a sublinear-time \textit{quantum} algorithm. In addition to this, we prove that any quantum algorithm requires at least

\tilde{\Omega}(N^{2/3})

time to solve this problem

Louisiana State University

Elastic-Degenerate String Matching with 1 Error

Author: Bernardini Giulia
Gabory Estéban
Pissis Solon P.
Stougie Leen
Sweering Michelle
Zuba Wiktor
Publication venue
Publication date: 01/01/2022
Field of study

An elastic-degenerate string is a sequence of

n

finite sets of strings of total length

N

, introduced to represent a set of related DNA sequences, also known as a pangenome. The ED string matching (EDSM) problem consists in reporting all occurrences of a pattern of length

m

in an ED text. This problem has recently received some attention by the combinatorial pattern matching community, culminating in an

\tilde{\mathcal{O}}(nm^{\omega-1})+\mathcal{O}(N)

-time algorithm [Bernardini et al., SIAM J. Comput. 2022], where

\omega

denotes the matrix multiplication exponent and the

\tilde{\mathcal{O}}(\cdot)

notation suppresses polylog factors. In the

k

-EDSM problem, the approximate version of EDSM, we are asked to report all pattern occurrences with at most

k

errors.

k

-EDSM can be solved in

\mathcal{O}(k^2mG+kN)

time, under edit distance, or

\mathcal{O}(kmG+kN)

time, under Hamming distance, where

G

denotes the total number of strings in the ED text [Bernardini et al., Theor. Comput. Sci. 2020]. Unfortunately,

G

is only bounded by

N

, and so even for

k=1

, the existing algorithms run in

\Omega(mN)

time in the worst case. In this paper we show that

1

-EDSM can be solved in

\mathcal{O}((nm^2 + N)\log m)

\mathcal{O}(nm^3 + N)

time under edit distance. For the decision version, we present a faster

\mathcal{O}(nm^2\sqrt{\log m} + N\log\log m)

-time algorithm. We also show that

1

-EDSM can be solved in

\mathcal{O}(nm^2 + N\log m)

time under Hamming distance. Our algorithms for edit distance rely on non-trivial reductions from

1

-EDSM to special instances of classic computational geometry problems (2d rectangle stabbing or 2d range emptiness), which we show how to solve efficiently. In order to obtain an even faster algorithm for Hamming distance, we rely on employing and adapting the

k

-errata trees for indexing with errors [Cole et al., STOC 2004].Comment: This is an extended version of a paper accepted at LATIN 202

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

Stabbing Planes

Author: Beame Paul
Fleming Noah
Impagliazzo Russell
Kolokolova Antonina
Pankratov Denis
Pitassi Toniann
Robere Robert
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)
Publication date: 01/01/2018
Field of study

We introduce and develop a new semi-algebraic proof system, called Stabbing Planes that is in the style of DPLL-based modern SAT solvers. As with DPLL, there is only one rule: the current polytope can be subdivided by branching on an inequality and its "integer negation." That is, we can (nondeterministically choose) a hyperplane a x >= b with integer coefficients, which partitions the polytope into three pieces: the points in the polytope satisfying a x >= b, the points satisfying a x <= b-1, and the middle slab b-1 < a x < b. Since the middle slab contains no integer points it can be safely discarded, and the algorithm proceeds recursively on the other two branches. Each path terminates when the current polytope is empty, which is polynomial-time checkable. Among our results, we show somewhat surprisingly that Stabbing Planes can efficiently simulate Cutting Planes, and moreover, is strictly stronger than Cutting Planes under a reasonable conjecture. We prove linear lower bounds on the rank of Stabbing Planes refutations, by adapting a lifting argument in communication complexity

Dagstuhl Research Online Publication Server

Efficient Data Structures for Text Processing Applications

Author: Abedin Paniz
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/12/2021
Field of study

This thesis is devoted to designing and analyzing efficient text indexing data structures and associated algorithms for processing text data. The general problem is to preprocess a given text or a collection of texts into a space-efficient index to quickly answer various queries on this data. Basic queries such as counting/reporting a given pattern\u27s occurrences as substrings of the original text are useful in modeling critical bioinformatics applications. This line of research has witnessed many breakthroughs, such as the suffix trees, suffix arrays, FM-index, etc. In this work, we revisit the following problems: 1. The Heaviest Induced Ancestors problem 2. Range Longest Common Prefix problem 3. Range Shortest Unique Substrings problem 4. Non-Overlapping Indexing problem For the first problem, we present two new space-time trade-offs that improve the space, query time, or both of the existing solutions by roughly a logarithmic factor. For the second problem, our solution takes linear space, which improves the previous result by a logarithmic factor. The techniques developed are then extended to obtain an efficient solution for our third problem, which is newly formulated. Finally, we present a new framework that yields efficient solutions for the last problem in both cache-aware and cache-oblivious models

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

Hierarchical Categories in Colored Searching

Author: Afshani Peyman
Killmann Rasmus
Larsen Kasper Green
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

In colored range counting (CRC), the input is a set of points where each point is assigned a "color" (or a "category") and the goal is to store them in a data structure such that the number of distinct categories inside a given query range can be counted efficiently. CRC has strong motivations as it allows data structure to deal with categorical data. However, colors (i.e., the categories) in the CRC problem do not have any internal structure, whereas this is not the case for many datasets in practice where hierarchical categories exists or where a single input belongs to multiple categories. Motivated by these, we consider variants of the problem where such structures can be represented. We define two variants of the problem called hierarchical range counting (HCC) and sub-category colored range counting (SCRC) and consider hierarchical structures that can either be a DAG or a tree. We show that the two problems on some special trees are in fact equivalent to other well-known problems in the literature. Based on these, we also give efficient data structures when the underlying hierarchy can be represented as a tree. We show a conditional lower bound for the general case when the existing hierarchy can be any DAG, through reductions from the orthogonal vectors problem

Dagstuhl Research Online Publication Server

Tight Bounds on the Maximum Number of Shortest Unique Substrings

Author: Bannai Hideo
Inenaga Shunsuke
Mieno Takuya
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the hausdorff and other cluster Voronoi diagrams

Author: Khramtcova Elena
Papadopoulou Evanthia
Publication venue
Publication date: 05/01/2017
Field of study

The Voronoi diagram is a fundamental geometric structure that encodes proximity information. Given a set of geometric objects, called sites, their Voronoi diagram is a subdivision of the underlying space into maximal regions, such that all points within one region have the same nearest site. Problems in diverse application domains (such as VLSI CAD, robotics, facility location, etc.) demand various generalizations of this simple concept. While many generalized Voronoi diagrams have been well studied, many others still have unsettled questions. An example of the latter are cluster Voronoi diagrams, whose sites are sets (clusters) of objects rather than individual objects. In this dissertation we study certain cluster Voronoi diagrams from the perspective of their construction algorithms and algorithmic applications. Our main focus is the Hausdorff Voronoi diagram; we also study the farthest-segment Voronoi diagram, as well as certain special cases of the farthest-color Voronoi diagram. We establish a connection between cluster Voronoi diagrams and the stabbing circle problem for segments in the plane. Our results are as follows. (1) We investigate the randomized incremental construction of the Hausdorff Voronoi diagram. We consider separately the case of non-crossing clusters, when the combinatorial complexity of the diagram is O(n) where n is the total number of points in all clusters. For this case, we present two construction algorithms that require O(n log2 n) expected time. For the general case of arbitrary clusters, we present an algorithm that requires O((m + n log n) log n) expected time and O(m + n log n) expected space, where m is a parameter reflecting the number of crossings between clusters' convex hulls. (2) We present an O(n) time algorithm to construct the farthest-segment Voronoi diagram of n segments, after the sequence of its faces at infinity is known. This augments the well-known linear-time framework for Voronoi diagram of points in convex position, with the ability to handle disconnected Voronoi regions. (3) We establish a connection between the cluster Voronoi diagrams (the Hausdorff and the farthest-color Voronoi diagram) and the stabbing circle problem. This implies a new method to solve the latter problem. Our method results in a near-optimal O(n log2 n) time algorithm for a set of n parallel segments, and in an optimal O(n log n) time algorithm for a set of n segments satisfying some other special conditions. (4) We study the farthest-color Voronoi diagram in special cases considered by the stabbing circle problem. We prove O(n) bound for its combinatorial complexity and present an O(nlogn) time algorithm to construct it

RERO DOC Digital Library