2,094 research outputs found

    Subsequence Automata with Default Transitions

    Get PDF
    Let SS be a string of length nn with characters from an alphabet of size σ\sigma. The \emph{subsequence automaton} of SS (often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of SS. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is O(nσ)O(n\sigma) and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter kk, 1<kσ1 < k \leq \sigma, we present a subsequence automaton with default transitions of size O(nklogkσ)O(nk\log_{k}\sigma) and delay O(logkσ)O(\log_k \sigma). Hence, with k=2k = 2 we obtain an automaton of size O(nlogσ)O(n \log \sigma) and delay O(logσ)O(\log \sigma). On the other extreme, with k=σk = \sigma, we obtain an automaton of size O(nσ)O(n \sigma) and delay O(1)O(1), thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

    k-Universality of Regular Languages

    Get PDF
    A subsequence of a word w is a word u such that u = w[i1]w[i2] . . . w[ik], for some set of indices 1 ≤ i1 < i2 < · · · < ik ≤ |w|. A word w is k-subsequence universal over an alphabet Σ if every word in Σk appears in w as a subsequence. In this paper, we study the intersection between the set of k-subsequence universal words over some alphabet Σ and regular languages over Σ. We call a regular language L k-∃-subsequence universal if there exists a k-subsequence universal word in L, and k-∀-subsequence universal if every word of L is k-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is k-∃-subsequence universal and, respectively, if it is k-∀-subsequence universal, for a given k. The algorithms are FPT w.r.t. the size of the input alphabet, and their run-time does not depend on k; they run in polynomial time in the number n of states of the input automaton when the size of the input alphabet is O(log n). Moreover, we show that the problem of deciding if a given regular language is k-∃-subsequence universal is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of k-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of k-subsequence universal words accepted by a given finite automaton

    Compact Recognizers of Episode Sequences

    Get PDF
    Abstract Mikhail J. Atallah t Purdue University Given two strings T = at ... an and P = hI .. .h m over an alphabet E, the problem of testing whether P occurs as a subsequence of T is trivially solved in linear time. It is also known that a simple D(nlog lEI) time preprocessing ofT makes it easy to decide subsequently for any P and in at most IPJIog lEI character comparisons, whether P is a subsequence of T. These problems become more complicated if onc asks instead whether P occurs as a subsequence of some substring Y of T of bounded length. This paper presents an automaton built on the textstring T and capable of identifying all distinct minimal substrings Y of X having P as a subsequence. By a substring Y being minimal with respect to P, it is meant that P is not a subsequence of any proper substring of Y. For every minimal substring Y, the automaton recognizes the occurrence of P having lexicographically smallest sequence of symbol positions in Y. It is not difficult to realize such an automaton in time and space 0(n 2 ) for a text of n characters. One result of this paper consists of bringing those bounds down to linear or O(nlogn), respectively, depending on whether the alphabet is bounded or of arbitrary size, thereby matching the respective complexities of off-line exact string searching. Having built the automaton, the search for all lexicographically earliest occurrences of P in X is carried out in time O(n + k l rocc, . i . log n . log I~I), where rocc, is the number of distinct minimal substrings of T having b 1 ... b; as a subsequence. All log factors appearing in the above bounds can be further reduced to log log by resort to known integer-handling data structures. Index Terms -Algorithms, pattern matching, subsequence and episode searching, DAWG, suffix automaton, compact subsequence automaton, skip-edge DAWG, forward failure function, skip-link

    Fast and Compact Regular Expression Matching

    Get PDF
    We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

    The separation problem for regular languages by piecewise testable languages

    Full text link
    Separation is a classical problem in mathematics and computer science. It asks whether, given two sets belonging to some class, it is possible to separate them by another set of a smaller class. We present and discuss the separation problem for regular languages. We then give a direct polynomial time algorithm to check whether two given regular languages are separable by a piecewise testable language, that is, whether a BΣ1(<)B{\Sigma}1(<) sentence can witness that the languages are indeed disjoint. The proof is a reformulation and a refinement of an algebraic argument already given by Almeida and the second author

    Completeness Results for Parameterized Space Classes

    Full text link
    The parameterized complexity of a problem is considered "settled" once it has been shown to lie in FPT or to be complete for a class in the W-hierarchy or a similar parameterized hierarchy. Several natural parameterized problems have, however, resisted such a classification. At least in some cases, the reason is that upper and lower bounds for their parameterized space complexity have recently been obtained that rule out completeness results for parameterized time classes. In this paper, we make progress in this direction by proving that the associative generability problem and the longest common subsequence problem are complete for parameterized space classes. These classes are defined in terms of different forms of bounded nondeterminism and in terms of simultaneous time--space bounds. As a technical tool we introduce a "union operation" that translates between problems complete for classical complexity classes and for W-classes.Comment: IPEC 201

    Order preserving pattern matching on trees and DAGs

    Full text link
    The order preserving pattern matching (OPPM) problem is, given a pattern string pp and a text string tt, find all substrings of tt which have the same relative orders as pp. In this paper, we consider two variants of the OPPM problem where a set of text strings is given as a tree or a DAG. We show that the OPPM problem for a single pattern pp of length mm and a text tree TT of size NN can be solved in O(m+N)O(m+N) time if the characters of pp are drawn from an integer alphabet of polynomial size. The time complexity becomes O(mlogm+N)O(m \log m + N) if the pattern pp is over a general ordered alphabet. We then show that the OPPM problem for a single pattern and a text DAG is NP-complete