Search CORE

2,094 research outputs found

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Online Research Database In Technology

k-Universality of Regular Languages

Author: Adamson D
Fleischmann P
Huch A
Koß T
Manea F
Nowotka D
Publication venue
Publication date: 28/11/2023
Field of study

A subsequence of a word w is a word u such that u = w[i1]w[i2] . . . w[ik], for some set of indices 1 ≤ i1 < i2 < · · · < ik ≤ |w|. A word w is k-subsequence universal over an alphabet Σ if every word in Σk appears in w as a subsequence. In this paper, we study the intersection between the set of k-subsequence universal words over some alphabet Σ and regular languages over Σ. We call a regular language L k-∃-subsequence universal if there exists a k-subsequence universal word in L, and k-∀-subsequence universal if every word of L is k-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is k-∃-subsequence universal and, respectively, if it is k-∀-subsequence universal, for a given k. The algorithms are FPT w.r.t. the size of the input alphabet, and their run-time does not depend on k; they run in polynomial time in the number n of states of the input automaton when the size of the input alphabet is O(log n). Moreover, we show that the problem of deciding if a given regular language is k-∃-subsequence universal is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of k-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of k-subsequence universal words accepted by a given finite automaton

University of Liverpool Repository

Compact Recognizers of Episode Sequences

Author
Publication venue
Publication date: 01/01/1997
Field of study

Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link

CiteSeerX

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

The separation problem for regular languages by piecewise testable languages

Author: L. Van
M. Zeitoun
Rooijen
Publication venue
Publication date: 08/03/2013
Field of study

Separation is a classical problem in mathematics and computer science. It asks whether, given two sets belonging to some class, it is possible to separate them by another set of a smaller class. We present and discuss the separation problem for regular languages. We then give a direct polynomial time algorithm to check whether two given regular languages are separable by a piecewise testable language, that is, whether a

B{\Sigma}1(<)

sentence can witness that the languages are indeed disjoint. The proof is a reformulation and a refinement of an algebraic argument already given by Almeida and the second author

arXiv.org e-Print Archive

CiteSeerX

Completeness Results for Parameterized Space Classes

Author: C.M.R. Kintala
J. Flum
J. Hartmanis
L. Cai
L. Cai
M. Elberfeld
M. Fellows
N.D. Jones
N.D. Jones
S. Buss
S. Guillemot
Publication venue
Publication date: 01/01/2013
Field of study

The parameterized complexity of a problem is considered "settled" once it has been shown to lie in FPT or to be complete for a class in the W-hierarchy or a similar parameterized hierarchy. Several natural parameterized problems have, however, resisted such a classification. At least in some cases, the reason is that upper and lower bounds for their parameterized space complexity have recently been obtained that rule out completeness results for parameterized time classes. In this paper, we make progress in this direction by proving that the associative generability problem and the longest common subsequence problem are complete for parameterized space classes. These classes are defined in terms of different forms of bounded nondeterminism and in terms of simultaneous time--space bounds. As a technical tool we introduce a "union operation" that translates between problems complete for classical complexity classes and for W-classes.Comment: IPEC 201

arXiv.org e-Print Archive

Crossref

Order preserving pattern matching on trees and DAGs

Author: A Amir
A Amir
I Simon
J Kim
K Park
M Dubiner
M Kubica
P Bose
RA Baeza-Yates
S Cho
S Faro
T Chhabra
Publication venue
Publication date: 25/07/2017
Field of study

The order preserving pattern matching (OPPM) problem is, given a pattern string

p

and a text string

t

, find all substrings of

t

which have the same relative orders as

p

. In this paper, we consider two variants of the OPPM problem where a set of text strings is given as a tree or a DAG. We show that the OPPM problem for a single pattern