10,814 research outputs found
Quotient Complexity of Regular Languages
The past research on the state complexity of operations on regular languages
is examined, and a new approach based on an old method (derivatives of regular
expressions) is presented. Since state complexity is a property of a language,
it is appropriate to define it in formal-language terms as the number of
distinct quotients of the language, and to call it "quotient complexity". The
problem of finding the quotient complexity of a language f(K,L) is considered,
where K and L are regular languages and f is a regular operation, for example,
union or concatenation. Since quotients can be represented by derivatives, one
can find a formula for the typical quotient of f(K,L) in terms of the quotients
of K and L. To obtain an upper bound on the number of quotients of f(K,L) all
one has to do is count how many such quotients are possible, and this makes
automaton constructions unnecessary. The advantages of this point of view are
illustrated by many examples. Moreover, new general observations are presented
to help in the estimation of the upper bounds on quotient complexity of regular
operations
On the Structure and Complexity of Rational Sets of Regular Languages
In a recent thread of papers, we have introduced FQL, a precise specification
language for test coverage, and developed the test case generation engine
FShell for ANSI C. In essence, an FQL test specification amounts to a set of
regular languages, each of which has to be matched by at least one test
execution. To describe such sets of regular languages, the FQL semantics uses
an automata-theoretic concept known as rational sets of regular languages
(RSRLs). RSRLs are automata whose alphabet consists of regular expressions.
Thus, the language accepted by the automaton is a set of regular expressions.
In this paper, we study RSRLs from a theoretic point of view. More
specifically, we analyze RSRL closure properties under common set theoretic
operations, and the complexity of membership checking, i.e., whether a regular
language is an element of a RSRL. For all questions we investigate both the
general case and the case of finite sets of regular languages. Although a few
properties are left as open problems, the paper provides a systematic semantic
foundation for the test specification language FQL
The Complexity of SORE-definability Problems
Single occurrence regular expressions (SORE) are a special kind of deterministic regular expressions, which are extensively used in the schema languages DTD and XSD for XML documents. In this paper, with motivations from the simplification of XML schemas, we consider the SORE-definability problem: Given a regular expression, decide whether it has an equivalent SORE. We investigate extensively the complexity of the SORE-definability problem: We consider both (standard) regular expressions and regular expressions with counting, and distinguish between the alphabets of size at least two and unary alphabets. In all cases, we obtain tight complexity bounds. In addition, we consider another variant of this problem, the bounded SORE-definability problem, which is to decide, given a regular expression E and a number M (encoded in unary or binary), whether there is an SORE, which is equivalent to E on the set of words of length at most M. We show that in several cases, there is an exponential decrease in the complexity when switching from the SORE-definability problem to its bounded variant
Which Regular Expression Patterns are Hard to Match?
Regular expressions constitute a fundamental notion in formal language theory
and are frequently used in computer science to define search patterns. A
classic algorithm for these problems constructs and simulates a
non-deterministic finite automaton corresponding to the expression, resulting
in an running time (where is the length of the pattern and is
the length of the text). This running time can be improved slightly (by a
polylogarithmic factor), but no significantly faster solutions are known. At
the same time, much faster algorithms exist for various special cases of
regular expressions, including dictionary matching, wildcard matching, subset
matching, word break problem etc.
In this paper, we show that the complexity of regular expression matching can
be characterized based on its {\em depth} (when interpreted as a formula). Our
results hold for expressions involving concatenation, OR, Kleene star and
Kleene plus. For regular expressions of depth two (involving any combination of
the above operators), we show the following dichotomy: matching and membership
testing can be solved in near-linear time, except for "concatenations of
stars", which cannot be solved in strongly sub-quadratic time assuming the
Strong Exponential Time Hypothesis (SETH). For regular expressions of depth
three the picture is more complex. Nevertheless, we show that all problems can
either be solved in strongly sub-quadratic time, or cannot be solved in
strongly sub-quadratic time assuming SETH.
An intriguing special case of membership testing involves regular expressions
of the form "a star of an OR of concatenations", e.g., . This
corresponds to the so-called {\em word break} problem, for which a dynamic
programming algorithm with a runtime of (roughly) is known. We
show that the latter bound is not tight and improve the runtime to
Partial Derivative Automaton for Regular Expressions with Shuffle
We generalize the partial derivative automaton to regular expressions with
shuffle and study its size in the worst and in the average case. The number of
states of the partial derivative automata is in the worst case at most 2^m,
where m is the number of letters in the expression, while asymptotically and on
average it is no more than (4/3)^m
- …