258,948 research outputs found

    Regular Expression Types for XML

    Get PDF
    We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing. The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be EXPTIME-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy\u27s set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing

    Regular Cost Functions, Part I: Logic and Algebra over Words

    Full text link
    The theory of regular cost functions is a quantitative extension to the classical notion of regularity. A cost function associates to each input a non-negative integer value (or infinity), as opposed to languages which only associate to each input the two values "inside" and "outside". This theory is a continuation of the works on distance automata and similar models. These models of automata have been successfully used for solving the star-height problem, the finite power property, the finite substitution problem, the relative inclusion star-height problem and the boundedness problem for monadic-second order logic over words. Our notion of regularity can be -- as in the classical theory of regular languages -- equivalently defined in terms of automata, expressions, algebraic recognisability, and by a variant of the monadic second-order logic. These equivalences are strict extensions of the corresponding classical results. The present paper introduces the cost monadic logic, the quantitative extension to the notion of monadic second-order logic we use, and show that some problems of existence of bounds are decidable for this logic. This is achieved by introducing the corresponding algebraic formalism: stabilisation monoids.Comment: 47 page

    Path constraints in semistructured data

    Get PDF
    International audienceWe consider semistructured data as multirooted edge-labelled directed graphs, and path inclusion constraints on these graphs. A path inclusion constraint pnot precedes, equalsq is satisfied by a semistructured data if any node reached by the regular query p is also reached by the regular query q. In this paper, two problems are mainly studied: the implication problem and the problem of the existence of a finite exact model. - We give a new decision algorithm for the implication problem of a constraint pnot precedes, equalsq by a set of bounded path constraints pinot precedes, equalsui where p, q, and the pi's are regular path expressions and the ui's are words, improving in this particular case, the more general algorithms of S. Abiteboul and V. Vianu, and N. Alechina et al. In the case of a set of word equalities ui≡vi, we provide a more efficient decision algorithm for the implication of a word equality u≡v, improving the more general algorithm of P. Buneman et al. We prove that, in this case, implication for nondeterministic models is equivalent to implication for (complete) deterministic ones. - We introduce the notion of exact model: an exact model of a set of path constraints Click to view the MathML source satisfies the constraint pnot precedes, equalsq if and only if this constraint is implied by Click to view the MathML source. We prove that any set of constraints has an exact model and we give a decidable characterization of data which are exact models of bounded path inclusion constraints sets

    Unary Pushdown Automata and Straight-Line Programs

    Full text link
    We consider decision problems for deterministic pushdown automata over a unary alphabet (udpda, for short). Udpda are a simple computation model that accept exactly the unary regular languages, but can be exponentially more succinct than finite-state automata. We complete the complexity landscape for udpda by showing that emptiness (and thus universality) is P-hard, equivalence and compressed membership problems are P-complete, and inclusion is coNP-complete. Our upper bounds are based on a translation theorem between udpda and straight-line programs over the binary alphabet (SLPs). We show that the characteristic sequence of any udpda can be represented as a pair of SLPs---one for the prefix, one for the lasso---that have size linear in the size of the udpda and can be computed in polynomial time. Hence, decision problems on udpda are reduced to decision problems on SLPs. Conversely, any SLP can be converted in logarithmic space into a udpda, and this forms the basis for our lower bound proofs. We show coNP-hardness of the ordered matching problem for SLPs, from which we derive coNP-hardness for inclusion. In addition, we complete the complexity landscape for unary nondeterministic pushdown automata by showing that the universality problem is Π2P\Pi_2 \mathrm P-hard, using a new class of integer expressions. Our techniques have applications beyond udpda. We show that our results imply Π2P\Pi_2 \mathrm P-completeness for a natural fragment of Presburger arithmetic and coNP lower bounds for compressed matching problems with one-character wildcards

    Register Set Automata (Technical Report)

    Full text link
    We present register set automata (RsAs), a register automaton model over data words where registers can contain sets of data values and the following operations are supported: adding values to registers, clearing registers, and testing (non-)membership. We show that the emptiness problem for RsAs is decidable and complete for the FωF_\omega class. Moreover, we show that a large class of register automata can be transformed into deterministic RsAs, which can serve as a basis for (i) fast matching of a family of regular expressions with back-references and (ii) language inclusion algorithm for a sub-class of register automata. RsAs are incomparable in expressive power to other popular automata models over data words, such as alternating register automata and pebble automata

    Regular Methods for Operator Precedence Languages

    Get PDF
    The operator precedence languages (OPLs) represent the largest known subclass of the context-free languages which enjoys all desirable closure and decidability properties. This includes the decidability of language inclusion, which is the ultimate verification problem. Operator precedence grammars, automata, and logics have been investigated and used, for example, to verify programs with arithmetic expressions and exceptions (both of which are deterministic pushdown but lie outside the scope of the visibly pushdown languages). In this paper, we complete the picture and give, for the first time, an algebraic characterization of the class of OPLs in the form of a syntactic congruence that has finitely many equivalence classes exactly for the operator precedence languages. This is a generalization of the celebrated Myhill-Nerode theorem for the regular languages to OPLs. As one of the consequences, we show that universality and language inclusion for nondeterministic operator precedence automata can be solved by an antichain algorithm. Antichain algorithms avoid determinization and complementation through an explicit subset construction, by leveraging a quasi-order on words, which allows the pruning of the search space for counterexample words without sacrificing completeness. Antichain algorithms can be implemented symbolically, and these implementations are today the best-performing algorithms in practice for the inclusion of finite automata. We give a generic construction of the quasi-order needed for antichain algorithms from a finite syntactic congruence. This yields the first antichain algorithm for OPLs, an algorithm that solves the ExpTime-hard language inclusion problem for OPLs in exponential time

    Efficient Algorithms for Counting Automata

    Get PDF
    Čítací automaty (CA) jsou klasické konečné automaty rozšířené o omezené čítače. CA stále reprezentují třídu regulárních jazyků, ale kompaktněji než konečné automaty. Jelikož jsou CA nedávným modelem, chybějí zde efektivní algoritmy implementující různé operace nad nimi. V této práci se primárně soustředíme na existující podtřídu CA zvanou monadické čítací automaty (MCA). Jsou to CA s čítacími smyčkami na třídě znaků, které se často vyskytují v praxi (např. při detekci paketů v síťovém provozu nebo analýze log souborů). Pro tuto podtřídu efektivně vyřešíme problémy prázdnosti a inkluze. Navíc poskytneme dvě rozšíření třídy MCA, které jsou stále podtřídou CA, a vyřešíme pro ně efektivně problém prázdnosti. MCA přirozeně vznikají z regulárních výrazů, které jsou rozšířené o čítací operátory vyskytující se pouze na třídě znaků. Náš algoritmus řešící problém inkluze MCA tedy může být použit jako základ nové metody pro testování inkluze takových regulárních výrazů. Tento přístup jsme experimentálně vyhodnotili na regulárních výrazech z praxe a porovnali s naivní metodou. Experimenty ukazují, že metoda používající náš algoritmus je více odolná proti stavové explozi. Také překonává naivní metodu, pokud regulární výrazy obsahují čítací operátory s velkými mezemi. Podle očekávání, pro jednoduché případy je naivní metoda stále rychlejší než metoda používající náš algoritmus.Counting automata (CAs) are classical finite automata extended with bounded counters. They still denote the class of regular languages but in a more compact way than finite automata. Since CAs are a recent model, there is a gap in the knowledge of efficient algorithms implementing various operations on the CAs. In this thesis, we mainly focus on an existing subclass of CAs called monadic counting automata (MCAs), i.e., CAs with counting loops on character classes, which are common in practice (e.g., detection of packets in network traffic, log analysis). For this subclass, we efficiently solve the emptiness and inclusion problems. Moreover, we provide two extensions of the class of MCAs (but not beyond the class of CAs) and efficiently solve the emptiness problem for them. MCAs naturally arise from regular expressions that are extended by the counting operator limited only to character classes. Thus our algorithm solving the inclusion problem of MCAs can be used in a new method for solving the inclusion problem of such regular expressions. We experimentally evaluated this method on regular expressions from a wide range of applications and compared it with the naive method. The experiments show that the method using our algorithm is less prone the explode. It also outperforms the naive method if the regular expressions contain counting operators with large bounds. As expected, for the easy cases, the naive method is still faster than the method based on our algorithm.

    On the Structure and Complexity of Rational Sets of Regular Languages

    Get PDF
    In a recent thread of papers, we have introduced FQL, a precise specification language for test coverage, and developed the test case generation engine FShell for ANSI C. In essence, an FQL test specification amounts to a set of regular languages, each of which has to be matched by at least one test execution. To describe such sets of regular languages, the FQL semantics uses an automata-theoretic concept known as rational sets of regular languages (RSRLs). RSRLs are automata whose alphabet consists of regular expressions. Thus, the language accepted by the automaton is a set of regular expressions. In this paper, we study RSRLs from a theoretic point of view. More specifically, we analyze RSRL closure properties under common set theoretic operations, and the complexity of membership checking, i.e., whether a regular language is an element of a RSRL. For all questions we investigate both the general case and the case of finite sets of regular languages. Although a few properties are left as open problems, the paper provides a systematic semantic foundation for the test specification language FQL

    Deciding subset relationship of co-inductively defined set constants

    Get PDF
    Static analysis of different non-strict functional programming languages makes use of set constants like Top, Inf, and Bot denoting all expressions, all lists without a last Nil as tail, and all non-terminating programs, respectively. We use a set language that permits union, constructors and recursive definition of set constants with a greatest fixpoint semantics. This paper proves decidability, in particular EXPTIMEcompleteness, of subset relationship of co-inductively defined sets by using algorithms and results from tree automata. This shows decidability of the test for set inclusion, which is required by certain strictness analysis algorithms in lazy functional programming languages
    corecore