35,769 research outputs found

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    On the Relation between Context-Free Grammars and Parsing Expression Grammars

    Full text link
    Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs

    Static and dynamic semantics of NoSQL languages

    Get PDF
    We present a calculus for processing semistructured data that spans differences of application area among several novel query languages, broadly categorized as "NoSQL". This calculus lets users define their own operators, capturing a wider range of data processing capabilities, whilst providing a typing precision so far typical only of primitive hard-coded operators. The type inference algorithm is based on semantic type checking, resulting in type information that is both precise, and flexible enough to handle structured and semistructured data. We illustrate the use of this calculus by encoding a large fragment of Jaql, including operations and iterators over JSON, embedded SQL expressions, and co-grouping, and show how the encoding directly yields a typing discipline for Jaql as it is, namely without the addition of any type definition or type annotation in the code

    On Decidable Growth-Rate Properties of Imperative Programs

    Full text link
    In 2008, Ben-Amram, Jones and Kristiansen showed that for a simple "core" programming language - an imperative language with bounded loops, and arithmetics limited to addition and multiplication - it was possible to decide precisely whether a program had certain growth-rate properties, namely polynomial (or linear) bounds on computed values, or on the running time. This work emphasized the role of the core language in mitigating the notorious undecidability of program properties, so that one deals with decidable problems. A natural and intriguing problem was whether more elements can be added to the core language, improving its utility, while keeping the growth-rate properties decidable. In particular, the method presented could not handle a command that resets a variable to zero. This paper shows how to handle resets. The analysis is given in a logical style (proof rules), and its complexity is shown to be PSPACE-complete (in contrast, without resets, the problem was PTIME). The analysis algorithm evolved from the previous solution in an interesting way: focus was shifted from proving a bound to disproving it, and the algorithm works top-down rather than bottom-up

    A Practical Type Analysis for Verification of Modular Prolog Programs

    Get PDF
    Regular types are a powerful tool for computing very precise descriptive types for logic programs. However, in the context of real life, modular Prolog programs, the accurate results obtained by regular types often come at the price of efficiency. In this paper we propose a combination of techniques aimed at improving analysis efficiency in this context. As a first technique we allow optionally reducing the accuracy of inferred types by using only the types defined by the user or present in the libraries. We claim that, for the purpose of verifying type signatures given in the form of assertions the precision obtained using this approach is sufficient, and show that analysis times can be reduced significantly. Our second technique is aimed at dealing with situations where we would like to limit the amount of reanalysis performed, especially for library modules. Borrowing some ideas from polymorphic type systems, we show how to solve the problem by admitting parameters in type specifications. This allows us to compose new call patterns with some pre computed analysis info without losing any information. We argue that together these two techniques contribute to the practical and scalable analysis and verification of types in Prolog programs

    On generic context lemmas for lambda calculi with sharing

    Get PDF
    This paper proves several generic variants of context lemmas and thus contributes to improving the tools to develop observational semantics that is based on a reduction semantics for a language. The context lemmas are provided for may- as well as two variants of mustconvergence and a wide class of extended lambda calculi, which satisfy certain abstract conditions. The calculi must have a form of node sharing, e.g. plain beta reduction is not permitted. There are two variants, weakly sharing calculi, where the beta-reduction is only permitted for arguments that are variables, and strongly sharing calculi, which roughly correspond to call-by-need calculi, where beta-reduction is completely replaced by a sharing variant. The calculi must obey three abstract assumptions, which are in general easily recognizable given the syntax and the reduction rules. The generic context lemmas have as instances several context lemmas already proved in the literature for specific lambda calculi with sharing. The scope of the generic context lemmas comprises not only call-by-need calculi, but also call-by-value calculi with a form of built-in sharing. Investigations in other, new variants of extended lambda-calculi with sharing, where the language or the reduction rules and/or strategy varies, will be simplified by our result, since specific context lemmas are immediately derivable from the generic context lemma, provided our abstract conditions are met

    Parikh Image of Pushdown Automata

    Full text link
    We compare pushdown automata (PDAs for short) against other representations. First, we show that there is a family of PDAs over a unary alphabet with nn states and p2n+4p \geq 2n + 4 stack symbols that accepts one single long word for which every equivalent context-free grammar needs Ω(n2(p2n4))\Omega(n^2(p-2n-4)) variables. This family shows that the classical algorithm for converting a PDA to an equivalent context-free grammar is optimal even when the alphabet is unary. Moreover, we observe that language equivalence and Parikh equivalence, which ignores the ordering between symbols, coincide for this family. We conclude that, when assuming this weaker equivalence, the conversion algorithm is also optimal. Second, Parikh's theorem motivates the comparison of PDAs against finite state automata. In particular, the same family of unary PDAs gives a lower bound on the number of states of every Parikh-equivalent finite state automaton. Finally, we look into the case of unary deterministic PDAs. We show a new construction converting a unary deterministic PDA into an equivalent context-free grammar that achieves best known bounds.Comment: 17 pages, 2 figure
    corecore