221 research outputs found

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization

    Get PDF
    Operator precedence languages were introduced half a century ago by Robert Floyd to support deterministic and efficient parsing of context-free languages. Recently, we renewed our interest in this class of languages thanks to a few distinguishing properties that make them attractive for exploiting various modern technologies. Precisely, their local parsability enables parallel and incremental parsing, whereas their closure properties make them amenable to automatic verification techniques, including model checking. In this paper we provide a fairly complete theory of this class of languages: we introduce a class of automata with the same recognizing power as the generative power of their grammars; we provide a characterization of their sentences in terms of monadic second-order logic as has been done in previous literature for more restricted language classes such as regular, parenthesis, and input-driven ones; we investigate preserved and lost properties when extending the language sentences from finite length to infinite length (omegaomega-languages). As a result, we obtain a class of languages that enjoys many of the nice properties of regular languages (closure and decidability properties, logic characterization) but is considerably larger than other families---typically parenthesis and input-driven ones---with the same properties, covering “almost” all deterministic languages

    MSO definable string transductions and two-way finite state transducers

    Full text link
    String transductions that are definable in monadic second-order (mso) logic (without the use of parameters) are exactly those realized by deterministic two-way finite state transducers. Nondeterministic mso definable string transductions (i.e., those definable with the use of parameters) correspond to compositions of two nondeterministic two-way finite state transducers that have the finite visit property. Both families of mso definable string transductions are characterized in terms of Hennie machines, i.e., two-way finite state transducers with the finite visit property that are allowed to rewrite their input tape.Comment: 63 pages, LaTeX2e. Extended abstract presented at 26-th ICALP, 199

    Algebraic properties of structured context-free languages: old approaches and novel developments

    Full text link
    The historical research line on the algebraic properties of structured CF languages initiated by McNaughton's Parenthesis Languages has recently attracted much renewed interest with the Balanced Languages, the Visibly Pushdown Automata languages (VPDA), the Synchronized Languages, and the Height-deterministic ones. Such families preserve to a varying degree the basic algebraic properties of Regular languages: boolean closure, closure under reversal, under concatenation, and Kleene star. We prove that the VPDA family is strictly contained within the Floyd Grammars (FG) family historically known as operator precedence. Languages over the same precedence matrix are known to be closed under boolean operations, and are recognized by a machine whose pop or push operations on the stack are purely determined by terminal letters. We characterize VPDA's as the subclass of FG having a peculiarly structured set of precedence relations, and balanced grammars as a further restricted case. The non-counting invariance property of FG has a direct implication for VPDA too.Comment: Extended version of paper presented at WORDS2009, Salerno,Italy, September 200

    Extended macro grammars and stack controlled machines

    Get PDF
    K-extended basic macro grammars are introduced, where K is any class of languages. The class B(K) of languages generated by such grammars is investigated, together with the class LB(K) of languages generated by the corresponding linear basic grammars. For any full semi-AFL K, B(K) is a full AFL closed under iterated LB(K)-substitution, but not necessarily under substitution. For any machine type D, the stack controlled machine type corresponding to D is introduced, denoted S(D), and the checking-stack controlled machine type CS(D). The data structure of this machine is a stack which controls a pushdown of data structures from D. If D accepts K, then S(D) accepts B(K) and CS(D) accepts LB(K). Thus the classes B(K) are characterized by stack controlled machines and the classes LB(K), i.e., the full hyper-AFLs, by checking-stack controlled machines. A full basic-AFL is a full AFL K such that B(K)C K. Every full basic-AFL is a full hyper-AFL, but not vice versa. The class of OI macro languages (i.e., indexed languages, i.e., nested stack automaton languages) is a full basic-AFL, properly containing the smallest full basic-AFL. The latter is generated by the ultrabasic macro grammars and accepted by the nested stack automata with bounded depth of nesting (and properly contains the stack languages, the ETOL languages, i.e., the smallest full hyper-AFL, and the basic macro languages). The full basic-AFLs are characterized by bounded nested stack controlled machines

    On the relationship between indexed grammars and logic programs

    Get PDF
    AbstractThis article provides detailed constructions demonstrating that the class of indexed grammars introduced as a simple extension of context-free grammars has essentially the same expressive power as the class of logic programs with unary predicates and functions and exactly one variable symbol.Some additional considerations are concerned with parsing procedures

    Beyond operator-precedence grammars and languages

    Get PDF
    Operator Precedence Languages (OPL) are deterministic context-free and have desirable properties. OPL are parallely parsable, and, when structurally compatible, are closed under Boolean operations, concatenation and star; they include the Input Driven languages. OPL use three relations between two terminal symbols, to assign syntax structure to words. We extend such relations to k-tuples of consecutive symbols, in agreement with strictly locally testable regular languages. For each k, the new corresponding class of Higher-order Operator Precedence languages properly includes the OPL and enjoy many of their properties. OPL are a strict hierarchy based on k, which contains maximal languages

    Programming language complexity analysis and its impact on Checkmarx activities

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringTools for Programming Languages processing, like Static Analysers (for instance, a Static Application Security Testing (SAST) tool, one of Checkmarx’s main products), must be adapted to cope with a given input when the source programming language changes. Complexity of the programming language is one of the key factors that deeply impact the time of giving support to it. This Master’s Project aims at proposing an approach for assessing language complexity, measuring, at a first stage, the complexity of its underlying context-free grammar (CFG). From the analysis of concrete case studies, factors have been identified that make the support process more time-consuming, in particular in the stages of language recognition and in the transformation to an abstract syntax tree (AST). In this sense, at a second stage, a set of language features is analysed in order to take into account the referred factors that also impact on the language processing. The main objective of the Master’s work here reported is to help development teams to improve the estimation of time and effort needed to adapt the SAST Tool in order to cope with a new programming language. In this dissertation a tool is proposed, that allows for the evaluation of the complexity of a language based on a set of metrics to classify the complexity of its grammar, along with a set of language properties. The tool compares the new language complexity so far determined with previously supported languages, to predict the effort to process the new language.Ferramentas para processamento de Linguagens de Programação, como os Analisadores Estáticos (por exemplo, uma ferramenta de Testes Estáticos para Análise da Segurança de Aplicações, um dos principais produtos da Checkmarx), devem ser adaptadas para lidar com uma dada entrada quando a linguagem de programação de origem muda. A complexidade da linguagem de programação é um dos fatores-chave que influencia profundamente o tempo de suporte à mesma. Este projeto de Mestrado visa propor uma abordagem para avaliar a complexidade de uma linguagem de programação, medindo, numa primeira fase, a complexidade da gramática independente de contexto (GIC) subjacente. A partir da análise de casos concretos, foram identificados fatores (relacionados como facilidades específicas oferecidas pela linguagem) que tornam o processo de suporte mais demorado, em particular nas fases de reconhecimento da linguagem e na transformação para uma árvore de sintaxe abstrata (AST). Neste sentido, numa segunda fase, foi identificado um conjunto de características linguísticas de modo a ter em conta os referidos fatores que também têm impacto no processamento da linguagem. O principal objetivo do trabalho de mestrado aqui relatado é auxiliar as equipas de desenvolvimento a melhorar a estimativa do tempo e esforço necessários para adaptar a ferramenta SAST de modo a lidar com uma nova linguagem de programação. Como resultado deste projeto, tal como se descreve na dissertação, é proposta uma ferramenta, que permite a avaliação da complexidade de uma linguagem com base num conjunto de métricas para classificar a complexidade da sua gramática, e em um conjunto de propriedades linguísticas. A ferramenta compara a complexidade da nova linguagem, avaliada por aplicação do processo referido, com as linguagens anteriormente suportadas, para prever o esforço para processar a nova linguagem

    Toward a theory of input-driven locally parsable languages

    Get PDF
    If a context-free language enjoys the local parsability property then, no matter how the source string is segmented, each segment can be parsed independently, and an efficient parallel parsing algorithm becomes possible. The new class of locally chain parsable languages (LCPLs), included in the deterministic context-free language family, is here defined by means of the chain-driven automaton and characterized by decidable properties of grammar derivations. Such automaton decides whether to reduce or not a substring in a way purely driven by the terminal characters, thus extending the well-known concept of input-driven (ID) alias visibly pushdown machines. The LCPL family extends and improves the practically relevant Floyd's operator-precedence (OP) languages which are known to strictly include the ID languages, and for which a parallel-parser generator exists
    • …
    corecore