Search CORE

70 research outputs found

On the Hardness and Inapproximability of Recognizing Wheeler Graphs

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present: - The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA\u27s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA\u27s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1; - There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time; - We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation; - We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1); The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem\u27s difficulty

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Compressibility-Aware Quantum Algorithms on Strings

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue
Publication date: 14/02/2023
Field of study

Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and theoretically significant compression algorithms -- the Lempel-Ziv77 algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT), and obtain the results below. We first provide a quantum algorithm running in

\tilde{O}(\sqrt{zn})

time for finding the LZ77 factorization of an input string

T[1..n]

with

z

factors. Combined with multiple existing results, this yields an

\tilde{O}(\sqrt{rn})

time quantum algorithm for finding the RL-BWT encoding with

r

BWT runs. Note that

r = \tilde{\Theta}(z)

. We complement these results with lower bounds proving that our algorithms are optimal (up to polylog factors). Next, we study the problem of compressed indexing, where we provide a

\tilde{O}(\sqrt{rn})

time quantum algorithm for constructing a recently designed

\tilde{O}(r)

space structure with equivalent capabilities as the suffix tree. This data structure is then applied to numerous problems to obtain sublinear time quantum algorithms when the input is highly compressible. For example, we show that the longest common substring of two strings of total length

n

can be computed in

\tilde{O}(\sqrt{zn})

time, where

z

is the number of factors in the LZ77 factorization of their concatenation. This beats the best known

\tilde{O}(n^\frac{2}{3})

time quantum algorithm when

z

is sufficiently small

arXiv.org e-Print Archive

pBWT: Achieving succinct data structures for parameterized pattern matching and related problems

Author: Ganguly Arnab
Shah Rahul
Thankachan Sharma V.
Publication venue: LSU Digital Commons
Publication date: 01/01/2017
Field of study

The fields of succinct data structures and compressed text indexing have seen quite a bit of progress over the last two decades. An important achievement, primarily using techniques based on the Burrows-Wheeler Transform (BWT), was obtaining the full functionality of the suffix tree in the optimal number of bits. A crucial property that allows the use of BWT for designing compressed indexes is order-preserving suffix links. Specifically, the relative order between two suffixes in the subtree of an internal node is same as that of the suffixes obtained by truncating the furst character of the two suffixes. Unfortunately, in many variants of the text-indexing problem, for e.g., parameterized pattern matching, 2D pattern matching, and order-isomorphic pattern matching, this property does not hold. Consequently, the compressed indexes based on BWT do not directly apply. Furthermore, a compressed index for any of these variants has been elusive throughout the advancement of the field of succinct data structures. We achieve a positive breakthrough on one such problem, namely the Parameterized Pattern Matching problem. Let T be a text that contains n characters from an alphabet , which is the union of two disjoint sets: containing static characters (s-characters) and containing parameterized characters (p-characters). A pattern P (also over ) matches an equal-length substring S of T i the s-characters match exactly, and there exists a one-to-one function that renames the p-characters in S to that in P. The task is to find the starting positions (occurrences) of all such substrings S. Previous index [Baker, STOC 1993], known as Parameterized Suffix Tree, requires (n log n) bits of space, and can find all occ occurrences in time O(jPj log +occ), where = jj. We introduce an n log +O(n)-bit index with O(jPj log +occlog n log ) query time. At the core, lies a new BWT-like transform, which we call the Parame- terized Burrows-Wheeler Transform (pBWT). The techniques are extended to obtain a succinct index for the Parameterized Dictionary Matching problem of Idury and Schaer [CPM, 1994]

Crossref

Louisiana State University

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Non-Overlapping Indexing - Cache Obliviously

Author: Abedin Paniz
Hooshmand Sahar
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size

Dagstuhl Research Online Publication Server

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Structural Pattern Matching - Succinctly

Author: Ganguly Arnab
Shah Rahul
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

Let T be a text of length n containing characters from an alphabet Sigma, which is the union of two disjoint sets: Sigma_s containing static characters (s-characters) and Sigma_p containing parameterized characters (p-characters). Each character in Sigma_p has an associated complementary character from Sigma_p. A pattern P (also over Sigma) matches an equal-length substring

S

of T iff the s-characters match exactly, there exists a one-to-one function that renames the p-characters in S to the p-characters in P, and if a p-character x is renamed to another p-character y then the complement of x is renamed to the complement of y. The task is to find the starting positions (occurrences) of all such substrings S. Previous indexing solution [Shibuya, SWAT 2000], known as Structural Suffix Tree, requires Theta(nlog n) bits of space, and can find all occ occurrences in time O(|P|log sigma+ occ), where sigma = |Sigma|. In this paper, we present the first succinct index for this problem, which occupies n log sigma + O(n) bits and offers O(|P|logsigma+ occcdot log n logsigma) query time

Dagstuhl Research Online Publication Server

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)