221 research outputs found
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization
Operator precedence languages were introduced half a century ago by Robert Floyd to support deterministic and efficient parsing of context-free languages. Recently, we renewed our interest in this class of languages thanks to a few distinguishing properties that make them attractive for exploiting various modern technologies. Precisely, their local parsability enables parallel and incremental parsing, whereas their closure properties make them amenable to automatic verification techniques, including model checking. In this paper we provide a fairly complete theory of this class of languages: we introduce a class of automata with the same recognizing power as the generative power of their grammars; we provide a characterization of their sentences in terms of monadic second-order logic as has been done in previous literature for more restricted language classes such as regular, parenthesis, and input-driven ones; we investigate preserved and lost properties when extending the language sentences from finite length to infinite length (-languages). As a result, we obtain a class of languages that enjoys many of the nice properties of regular languages (closure and decidability properties, logic characterization) but is considerably larger than other families---typically parenthesis and input-driven ones---with the same properties, covering “almost” all deterministic languages
MSO definable string transductions and two-way finite state transducers
String transductions that are definable in monadic second-order (mso) logic
(without the use of parameters) are exactly those realized by deterministic
two-way finite state transducers. Nondeterministic mso definable string
transductions (i.e., those definable with the use of parameters) correspond to
compositions of two nondeterministic two-way finite state transducers that have
the finite visit property. Both families of mso definable string transductions
are characterized in terms of Hennie machines, i.e., two-way finite state
transducers with the finite visit property that are allowed to rewrite their
input tape.Comment: 63 pages, LaTeX2e. Extended abstract presented at 26-th ICALP, 199
Algebraic properties of structured context-free languages: old approaches and novel developments
The historical research line on the algebraic properties of structured CF
languages initiated by McNaughton's Parenthesis Languages has recently
attracted much renewed interest with the Balanced Languages, the Visibly
Pushdown Automata languages (VPDA), the Synchronized Languages, and the
Height-deterministic ones. Such families preserve to a varying degree the basic
algebraic properties of Regular languages: boolean closure, closure under
reversal, under concatenation, and Kleene star. We prove that the VPDA family
is strictly contained within the Floyd Grammars (FG) family historically known
as operator precedence. Languages over the same precedence matrix are known to
be closed under boolean operations, and are recognized by a machine whose pop
or push operations on the stack are purely determined by terminal letters. We
characterize VPDA's as the subclass of FG having a peculiarly structured set of
precedence relations, and balanced grammars as a further restricted case. The
non-counting invariance property of FG has a direct implication for VPDA too.Comment: Extended version of paper presented at WORDS2009, Salerno,Italy,
September 200
Extended macro grammars and stack controlled machines
K-extended basic macro grammars are introduced, where K is any class of languages. The class B(K) of languages generated by such grammars is investigated, together with the class LB(K) of languages generated by the corresponding linear basic grammars. For any full semi-AFL K, B(K) is a full AFL closed under iterated LB(K)-substitution, but not necessarily under substitution. For any machine type D, the stack controlled machine type corresponding to D is introduced, denoted S(D), and the checking-stack controlled machine type CS(D). The data structure of this machine is a stack which controls a pushdown of data structures from D. If D accepts K, then S(D) accepts B(K) and CS(D) accepts LB(K). Thus the classes B(K) are characterized by stack controlled machines and the classes LB(K), i.e., the full hyper-AFLs, by checking-stack controlled machines. A full basic-AFL is a full AFL K such that B(K)C K. Every full basic-AFL is a full hyper-AFL, but not vice versa. The class of OI macro languages (i.e., indexed languages, i.e., nested stack automaton languages) is a full basic-AFL, properly containing the smallest full basic-AFL. The latter is generated by the ultrabasic macro grammars and accepted by the nested stack automata with bounded depth of nesting (and properly contains the stack languages, the ETOL languages, i.e., the smallest full hyper-AFL, and the basic macro languages). The full basic-AFLs are characterized by bounded nested stack controlled machines
On the relationship between indexed grammars and logic programs
AbstractThis article provides detailed constructions demonstrating that the class of indexed grammars introduced as a simple extension of context-free grammars has essentially the same expressive power as the class of logic programs with unary predicates and functions and exactly one variable symbol.Some additional considerations are concerned with parsing procedures
Beyond operator-precedence grammars and languages
Operator Precedence Languages (OPL) are deterministic context-free and have desirable properties. OPL are parallely parsable, and, when structurally compatible, are closed under Boolean operations, concatenation and star; they include the Input Driven languages. OPL use three relations between two terminal symbols, to assign syntax structure to words. We extend such relations to k-tuples of consecutive symbols, in agreement with strictly locally testable regular languages. For each k, the new corresponding class of Higher-order Operator Precedence languages properly includes the OPL and enjoy many of their properties. OPL are a strict hierarchy based on k, which contains maximal languages
Programming language complexity analysis and its impact on Checkmarx activities
Dissertação de mestrado integrado em Informatics EngineeringTools for Programming Languages processing, like Static Analysers (for instance, a Static
Application Security Testing (SAST) tool, one of Checkmarx’s main products), must be
adapted to cope with a given input when the source programming language changes.
Complexity of the programming language is one of the key factors that deeply impact the
time of giving support to it.
This Master’s Project aims at proposing an approach for assessing language complexity,
measuring, at a first stage, the complexity of its underlying context-free grammar (CFG).
From the analysis of concrete case studies, factors have been identified that make the
support process more time-consuming, in particular in the stages of language recognition
and in the transformation to an abstract syntax tree (AST). In this sense, at a second stage, a
set of language features is analysed in order to take into account the referred factors that
also impact on the language processing.
The main objective of the Master’s work here reported is to help development teams to
improve the estimation of time and effort needed to adapt the SAST Tool in order to cope
with a new programming language.
In this dissertation a tool is proposed, that allows for the evaluation of the complexity of a
language based on a set of metrics to classify the complexity of its grammar, along with a set
of language properties. The tool compares the new language complexity so far determined
with previously supported languages, to predict the effort to process the new language.Ferramentas para processamento de Linguagens de Programação, como os Analisadores
Estáticos (por exemplo, uma ferramenta de Testes Estáticos para Análise da Segurança de
Aplicações, um dos principais produtos da Checkmarx), devem ser adaptadas para lidar
com uma dada entrada quando a linguagem de programação de origem muda.
A complexidade da linguagem de programação é um dos fatores-chave que influencia
profundamente o tempo de suporte Ă mesma.
Este projeto de Mestrado visa propor uma abordagem para avaliar a complexidade de uma
linguagem de programação, medindo, numa primeira fase, a complexidade da gramática
independente de contexto (GIC) subjacente.
A partir da análise de casos concretos, foram identificados fatores (relacionados como
facilidades especĂficas oferecidas pela linguagem) que tornam o processo de suporte mais
demorado, em particular nas fases de reconhecimento da linguagem e na transformação para
uma árvore de sintaxe abstrata (AST). Neste sentido, numa segunda fase, foi identificado
um conjunto de caracterĂsticas linguĂsticas de modo a ter em conta os referidos fatores que
também têm impacto no processamento da linguagem.
O principal objetivo do trabalho de mestrado aqui relatado Ă© auxiliar as equipas de
desenvolvimento a melhorar a estimativa do tempo e esforço necessários para adaptar a
ferramenta SAST de modo a lidar com uma nova linguagem de programação.
Como resultado deste projeto, tal como se descreve na dissertação, é proposta uma
ferramenta, que permite a avaliação da complexidade de uma linguagem com base num
conjunto de métricas para classificar a complexidade da sua gramática, e em um conjunto
de propriedades linguĂsticas. A ferramenta compara a complexidade da nova linguagem,
avaliada por aplicação do processo referido, com as linguagens anteriormente suportadas,
para prever o esforço para processar a nova linguagem
Toward a theory of input-driven locally parsable languages
If a context-free language enjoys the local parsability property then, no matter how the source string is segmented, each segment can be parsed independently, and an efficient parallel parsing algorithm becomes possible. The new class of locally chain parsable languages (LCPLs), included in the deterministic context-free language family, is here defined by means of the chain-driven automaton and characterized by decidable properties of grammar derivations. Such automaton decides whether to reduce or not a substring in a way purely driven by the terminal characters, thus extending the well-known concept of input-driven (ID) alias visibly pushdown machines. The LCPL family extends and improves the practically relevant Floyd's operator-precedence (OP) languages which are known to strictly include the ID languages, and for which a parallel-parser generator exists
- …