210,910 research outputs found

    Boundedness in languages of infinite words

    Full text link
    We define a new class of languages of ω\omega-words, strictly extending ω\omega-regular languages. One way to present this new class is by a type of regular expressions. The new expressions are an extension of ω\omega-regular expressions where two new variants of the Kleene star L∗L^* are added: LBL^B and LSL^S. These new exponents are used to say that parts of the input word have bounded size, and that parts of the input can have arbitrarily large sizes, respectively. For instance, the expression (aBb)ω(a^Bb)^\omega represents the language of infinite words over the letters a,ba,b where there is a common bound on the number of consecutive letters aa. The expression (aSb)ω(a^Sb)^\omega represents a similar language, but this time the distance between consecutive bb's is required to tend toward the infinite. We develop a theory for these languages, with a focus on decidability and closure. We define an equivalent automaton model, extending B\"uchi automata. The main technical result is a complementation lemma that works for languages where only one type of exponent---either LBL^B or LSL^S---is used. We use the closure and decidability results to obtain partial decidability results for the logic MSOLB, a logic obtained by extending monadic second-order logic with new quantifiers that speak about the size of sets

    The practical efficiency of regular expression membership algorithms

    Get PDF
    1 online resource (71 pages) : graphs, chartsIncludes abstract.Includes bibliographical references (69-71).Regular expressions encode text patterns and define languages of symbolic words. The membership problem decides if a given word is an element of the language described by a given regular expression. This problem has various well-studied algorithms, but current research only shows asymptotic complexity and performance with respect to samples of randomly generated regular expressions. Our research aims to answer how the algorithms perform when using practical regular expressions used in the real-world on a representative test set of words. A set of compatible regular expressions have been collected from public GitHub repositories. Each compatible expression (i.e., no backreferences or improper formatting) is then converted into an equivalent unambiguous mathematical representation. For each distinct expression, we have tested Thompson, Glushkov, position, follow, and partial derivative NFA constructions, as well as partial derivatives and exponential backtracking directly on the regular expression tree. These algorithms have been implemented into a modified version of the Python’s FAdo library and include UNIX-inspired extensions such as character classes, the wild dot, and UTF-8 support. We find that efficiently constructing a small NFA is the best approach to this problem; using follow and PDDAG algorithms are experimentally shown as the best

    Widths of regular and context-free languages

    Get PDF
    Given a partially-ordered finite alphabet Σ\Sigma and a language L⊆Σ∗L\subseteq \Sigma^*, how large can an antichain in LL be (where LL is given the lexicographic ordering)? More precisely, since LL will in general be infinite, we should ask about the rate of growth of maximum antichains consisting of words of length nn. This fundamental property of partial orders is known as the width, and in a companion work we show that the problem of computing the information leakage permitted by a deterministic interactive system modeled as a finite-state transducer can be reduced to the problem of computing the width of a certain regular language. In this paper, we show that if LL is regular then there is a dichotomy between polynomial and exponential antichain growth. We give a polynomial-time algorithm to distinguish the two cases, and to compute the order of polynomial growth, with the language specified as an NFA. For context-free languages we show that there is a similar dichotomy, but now the problem of distinguishing the two cases is undecidable. Finally, we generalise the lexicographic order to tree languages, and show that for regular tree languages there is a trichotomy between polynomial, exponential and doubly exponential antichain growth.Comment: 22 page

    Large Aperiodic Semigroups

    Get PDF
    The syntactic complexity of a regular language is the size of its syntactic semigroup. This semigroup is isomorphic to the transition semigroup of the minimal deterministic finite automaton accepting the language, that is, to the semigroup generated by transformations induced by non-empty words on the set of states of the automaton. In this paper we search for the largest syntactic semigroup of a star-free language having nn left quotients; equivalently, we look for the largest transition semigroup of an aperiodic finite automaton with nn states. We introduce two new aperiodic transition semigroups. The first is generated by transformations that change only one state; we call such transformations and resulting semigroups unitary. In particular, we study complete unitary semigroups which have a special structure, and we show that each maximal unitary semigroup is complete. For n≥4n \ge 4 there exists a complete unitary semigroup that is larger than any aperiodic semigroup known to date. We then present even larger aperiodic semigroups, generated by transformations that map a non-empty subset of states to a single state; we call such transformations and semigroups semiconstant. In particular, we examine semiconstant tree semigroups which have a structure based on full binary trees. The semiconstant tree semigroups are at present the best candidates for largest aperiodic semigroups. We also prove that 2n−12^n-1 is an upper bound on the state complexity of reversal of star-free languages, and resolve an open problem about a special case of state complexity of concatenation of star-free languages.Comment: 22 pages, 1 figure, 2 table

    Two-Sided Derivatives for Regular Expressions and for Hairpin Expressions

    Full text link
    The aim of this paper is to design the polynomial construction of a finite recognizer for hairpin completions of regular languages. This is achieved by considering completions as new expression operators and by applying derivation techniques to the associated extended expressions called hairpin expressions. More precisely, we extend partial derivation of regular expressions to two-sided partial derivation of hairpin expressions and we show how to deduce a recognizer for a hairpin expression from its two-sided derived term automaton, providing an alternative proof of the fact that hairpin completions of regular languages are linear context-free.Comment: 28 page

    From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

    Full text link
    The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527

    Partial Derivative Automaton for Regular Expressions with Shuffle

    Get PDF
    We generalize the partial derivative automaton to regular expressions with shuffle and study its size in the worst and in the average case. The number of states of the partial derivative automata is in the worst case at most 2^m, where m is the number of letters in the expression, while asymptotically and on average it is no more than (4/3)^m
    • …
    corecore