210,910 research outputs found
Boundedness in languages of infinite words
We define a new class of languages of -words, strictly extending
-regular languages.
One way to present this new class is by a type of regular expressions. The
new expressions are an extension of -regular expressions where two new
variants of the Kleene star are added: and . These new
exponents are used to say that parts of the input word have bounded size, and
that parts of the input can have arbitrarily large sizes, respectively. For
instance, the expression represents the language of infinite
words over the letters where there is a common bound on the number of
consecutive letters . The expression represents a similar
language, but this time the distance between consecutive 's is required to
tend toward the infinite.
We develop a theory for these languages, with a focus on decidability and
closure. We define an equivalent automaton model, extending B\"uchi automata.
The main technical result is a complementation lemma that works for languages
where only one type of exponent---either or ---is used.
We use the closure and decidability results to obtain partial decidability
results for the logic MSOLB, a logic obtained by extending monadic second-order
logic with new quantifiers that speak about the size of sets
The practical efficiency of regular expression membership algorithms
1 online resource (71 pages) : graphs, chartsIncludes abstract.Includes bibliographical references (69-71).Regular expressions encode text patterns and define languages of symbolic words. The membership problem decides if a given word is an element of the language described by a given regular expression. This problem has various well-studied algorithms, but current research only shows asymptotic complexity and performance with respect to samples of randomly generated regular expressions. Our research aims to answer how the algorithms perform when using practical regular expressions used in the real-world on a representative test set of words. A set of compatible regular expressions have been collected from public GitHub repositories. Each compatible expression (i.e., no backreferences or improper formatting) is then converted into an equivalent unambiguous mathematical representation. For each distinct expression, we have tested Thompson, Glushkov, position, follow, and partial derivative NFA constructions, as well as partial derivatives and exponential backtracking directly on the regular expression tree. These algorithms have been implemented into a modified version of the Python’s FAdo library and include UNIX-inspired extensions such as character classes, the wild dot, and UTF-8 support. We find that efficiently constructing a small NFA is the best approach to this problem; using follow and PDDAG algorithms are experimentally shown as the best
Widths of regular and context-free languages
Given a partially-ordered finite alphabet and a language , how large can an antichain in be (where is given the
lexicographic ordering)? More precisely, since will in general be infinite,
we should ask about the rate of growth of maximum antichains consisting of
words of length . This fundamental property of partial orders is known as
the width, and in a companion work we show that the problem of computing the
information leakage permitted by a deterministic interactive system modeled as
a finite-state transducer can be reduced to the problem of computing the width
of a certain regular language. In this paper, we show that if is regular
then there is a dichotomy between polynomial and exponential antichain growth.
We give a polynomial-time algorithm to distinguish the two cases, and to
compute the order of polynomial growth, with the language specified as an NFA.
For context-free languages we show that there is a similar dichotomy, but now
the problem of distinguishing the two cases is undecidable. Finally, we
generalise the lexicographic order to tree languages, and show that for regular
tree languages there is a trichotomy between polynomial, exponential and doubly
exponential antichain growth.Comment: 22 page
Large Aperiodic Semigroups
The syntactic complexity of a regular language is the size of its syntactic
semigroup. This semigroup is isomorphic to the transition semigroup of the
minimal deterministic finite automaton accepting the language, that is, to the
semigroup generated by transformations induced by non-empty words on the set of
states of the automaton. In this paper we search for the largest syntactic
semigroup of a star-free language having left quotients; equivalently, we
look for the largest transition semigroup of an aperiodic finite automaton with
states.
We introduce two new aperiodic transition semigroups. The first is generated
by transformations that change only one state; we call such transformations and
resulting semigroups unitary. In particular, we study complete unitary
semigroups which have a special structure, and we show that each maximal
unitary semigroup is complete. For there exists a complete unitary
semigroup that is larger than any aperiodic semigroup known to date.
We then present even larger aperiodic semigroups, generated by
transformations that map a non-empty subset of states to a single state; we
call such transformations and semigroups semiconstant. In particular, we
examine semiconstant tree semigroups which have a structure based on full
binary trees. The semiconstant tree semigroups are at present the best
candidates for largest aperiodic semigroups.
We also prove that is an upper bound on the state complexity of
reversal of star-free languages, and resolve an open problem about a special
case of state complexity of concatenation of star-free languages.Comment: 22 pages, 1 figure, 2 table
Two-Sided Derivatives for Regular Expressions and for Hairpin Expressions
The aim of this paper is to design the polynomial construction of a finite
recognizer for hairpin completions of regular languages. This is achieved by
considering completions as new expression operators and by applying derivation
techniques to the associated extended expressions called hairpin expressions.
More precisely, we extend partial derivation of regular expressions to
two-sided partial derivation of hairpin expressions and we show how to deduce a
recognizer for a hairpin expression from its two-sided derived term automaton,
providing an alternative proof of the fact that hairpin completions of regular
languages are linear context-free.Comment: 28 page
From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity
The equivalence of finite automata and regular expressions dates back to the
seminal paper of Kleene on events in nerve nets and finite automata from 1956.
In the present paper we tour a fragment of the literature and summarize results
on upper and lower bounds on the conversion of finite automata to regular
expressions and vice versa. We also briefly recall the known bounds for the
removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free
nondeterministic devices. Moreover, we report on recent results on the average
case descriptional complexity bounds for the conversion of regular expressions
to finite automata and brand new developments on the state elimination
algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527
Partial Derivative Automaton for Regular Expressions with Shuffle
We generalize the partial derivative automaton to regular expressions with
shuffle and study its size in the worst and in the average case. The number of
states of the partial derivative automata is in the worst case at most 2^m,
where m is the number of letters in the expression, while asymptotically and on
average it is no more than (4/3)^m
- …