35,769 research outputs found
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
On the Relation between Context-Free Grammars and Parsing Expression Grammars
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have
several similarities and a few differences in both their syntax and semantics,
but they are usually presented through formalisms that hinder a proper
comparison. In this paper we present a new formalism for CFGs that highlights
the similarities and differences between them. The new formalism borrows from
PEGs the use of parsing expressions and the recognition-based semantics. We
show how one way of removing non-determinism from this formalism yields a
formalism with the semantics of PEGs. We also prove, based on these new
formalisms, how LL(1) grammars define the same language whether interpreted as
CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular
grammars have simple language-preserving translations from CFGs to PEGs
Static and dynamic semantics of NoSQL languages
We present a calculus for processing semistructured data that spans
differences of application area among several novel query languages, broadly
categorized as "NoSQL". This calculus lets users define their own operators,
capturing a wider range of data processing capabilities, whilst providing a
typing precision so far typical only of primitive hard-coded operators. The
type inference algorithm is based on semantic type checking, resulting in type
information that is both precise, and flexible enough to handle structured and
semistructured data. We illustrate the use of this calculus by encoding a large
fragment of Jaql, including operations and iterators over JSON, embedded SQL
expressions, and co-grouping, and show how the encoding directly yields a
typing discipline for Jaql as it is, namely without the addition of any type
definition or type annotation in the code
On Decidable Growth-Rate Properties of Imperative Programs
In 2008, Ben-Amram, Jones and Kristiansen showed that for a simple "core"
programming language - an imperative language with bounded loops, and
arithmetics limited to addition and multiplication - it was possible to decide
precisely whether a program had certain growth-rate properties, namely
polynomial (or linear) bounds on computed values, or on the running time.
This work emphasized the role of the core language in mitigating the
notorious undecidability of program properties, so that one deals with
decidable problems.
A natural and intriguing problem was whether more elements can be added to
the core language, improving its utility, while keeping the growth-rate
properties decidable. In particular, the method presented could not handle a
command that resets a variable to zero. This paper shows how to handle resets.
The analysis is given in a logical style (proof rules), and its complexity is
shown to be PSPACE-complete (in contrast, without resets, the problem was
PTIME). The analysis algorithm evolved from the previous solution in an
interesting way: focus was shifted from proving a bound to disproving it, and
the algorithm works top-down rather than bottom-up
A Practical Type Analysis for Verification of Modular Prolog Programs
Regular types are a powerful tool for computing very precise descriptive types for logic programs. However, in the context of real life, modular Prolog programs, the accurate results obtained by regular types often come at the price of efficiency. In this paper we propose a combination of techniques aimed at improving analysis efficiency in this context. As a first technique we allow optionally reducing the accuracy of inferred types by using only the types defined by the user or present in the libraries. We claim that, for the purpose of verifying type signatures given in the form of assertions the precision obtained using this approach is sufficient, and show that analysis times can be reduced significantly. Our second technique is aimed at dealing with situations where we would like to limit the amount of reanalysis performed, especially for library modules. Borrowing some ideas from polymorphic type systems, we show how to solve the problem by admitting parameters in type specifications. This allows us to compose new call patterns with some pre computed analysis info without losing any information. We argue that together these two techniques contribute to the practical and scalable analysis and verification of types in Prolog programs
On generic context lemmas for lambda calculi with sharing
This paper proves several generic variants of context lemmas and thus contributes to improving the tools to develop observational semantics that is based on a reduction semantics for a language. The context lemmas are provided for may- as well as two variants of mustconvergence and a wide class of extended lambda calculi, which satisfy certain abstract conditions. The calculi must have a form of node sharing, e.g. plain beta reduction is not permitted. There are two variants, weakly sharing calculi, where the beta-reduction is only permitted for arguments that are variables, and strongly sharing calculi, which roughly correspond to call-by-need calculi, where beta-reduction is completely replaced by a sharing variant. The calculi must obey three abstract assumptions, which are in general easily recognizable given the syntax and the reduction rules. The generic context lemmas have as instances several context lemmas already proved in the literature for specific lambda calculi with sharing. The scope of the generic context lemmas comprises not only call-by-need calculi, but also call-by-value calculi with a form of built-in sharing. Investigations in other, new variants of extended lambda-calculi with sharing, where the language or the reduction rules and/or strategy varies, will be simplified by our result, since specific context lemmas are immediately derivable from the generic context lemma, provided our abstract conditions are met
Parikh Image of Pushdown Automata
We compare pushdown automata (PDAs for short) against other representations.
First, we show that there is a family of PDAs over a unary alphabet with
states and stack symbols that accepts one single long word for
which every equivalent context-free grammar needs
variables. This family shows that the classical algorithm for converting a PDA
to an equivalent context-free grammar is optimal even when the alphabet is
unary. Moreover, we observe that language equivalence and Parikh equivalence,
which ignores the ordering between symbols, coincide for this family. We
conclude that, when assuming this weaker equivalence, the conversion algorithm
is also optimal. Second, Parikh's theorem motivates the comparison of PDAs
against finite state automata. In particular, the same family of unary PDAs
gives a lower bound on the number of states of every Parikh-equivalent finite
state automaton. Finally, we look into the case of unary deterministic PDAs. We
show a new construction converting a unary deterministic PDA into an equivalent
context-free grammar that achieves best known bounds.Comment: 17 pages, 2 figure
- …