10,814 research outputs found

    Quotient Complexity of Regular Languages

    Full text link
    The past research on the state complexity of operations on regular languages is examined, and a new approach based on an old method (derivatives of regular expressions) is presented. Since state complexity is a property of a language, it is appropriate to define it in formal-language terms as the number of distinct quotients of the language, and to call it "quotient complexity". The problem of finding the quotient complexity of a language f(K,L) is considered, where K and L are regular languages and f is a regular operation, for example, union or concatenation. Since quotients can be represented by derivatives, one can find a formula for the typical quotient of f(K,L) in terms of the quotients of K and L. To obtain an upper bound on the number of quotients of f(K,L) all one has to do is count how many such quotients are possible, and this makes automaton constructions unnecessary. The advantages of this point of view are illustrated by many examples. Moreover, new general observations are presented to help in the estimation of the upper bounds on quotient complexity of regular operations

    On the Structure and Complexity of Rational Sets of Regular Languages

    Get PDF
    In a recent thread of papers, we have introduced FQL, a precise specification language for test coverage, and developed the test case generation engine FShell for ANSI C. In essence, an FQL test specification amounts to a set of regular languages, each of which has to be matched by at least one test execution. To describe such sets of regular languages, the FQL semantics uses an automata-theoretic concept known as rational sets of regular languages (RSRLs). RSRLs are automata whose alphabet consists of regular expressions. Thus, the language accepted by the automaton is a set of regular expressions. In this paper, we study RSRLs from a theoretic point of view. More specifically, we analyze RSRL closure properties under common set theoretic operations, and the complexity of membership checking, i.e., whether a regular language is an element of a RSRL. For all questions we investigate both the general case and the case of finite sets of regular languages. Although a few properties are left as open problems, the paper provides a systematic semantic foundation for the test specification language FQL

    The Complexity of SORE-definability Problems

    Get PDF
    Single occurrence regular expressions (SORE) are a special kind of deterministic regular expressions, which are extensively used in the schema languages DTD and XSD for XML documents. In this paper, with motivations from the simplification of XML schemas, we consider the SORE-definability problem: Given a regular expression, decide whether it has an equivalent SORE. We investigate extensively the complexity of the SORE-definability problem: We consider both (standard) regular expressions and regular expressions with counting, and distinguish between the alphabets of size at least two and unary alphabets. In all cases, we obtain tight complexity bounds. In addition, we consider another variant of this problem, the bounded SORE-definability problem, which is to decide, given a regular expression E and a number M (encoded in unary or binary), whether there is an SORE, which is equivalent to E on the set of words of length at most M. We show that in several cases, there is an exponential decrease in the complexity when switching from the SORE-definability problem to its bounded variant

    Which Regular Expression Patterns are Hard to Match?

    Full text link
    Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an O(mn)O(mn) running time (where mm is the length of the pattern and nn is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc. In this paper, we show that the complexity of regular expression matching can be characterized based on its {\em depth} (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH. An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g., [aabbc][a|ab|bc]^*. This corresponds to the so-called {\em word break} problem, for which a dynamic programming algorithm with a runtime of (roughly) O(nm)O(n\sqrt{m}) is known. We show that the latter bound is not tight and improve the runtime to O(nm0.44)O(nm^{0.44\ldots})

    Partial Derivative Automaton for Regular Expressions with Shuffle

    Get PDF
    We generalize the partial derivative automaton to regular expressions with shuffle and study its size in the worst and in the average case. The number of states of the partial derivative automata is in the worst case at most 2^m, where m is the number of letters in the expression, while asymptotically and on average it is no more than (4/3)^m
    corecore