239 research outputs found
On the equivalence, containment, and covering problems for the regular and context-free languages
We consider the complexity of the equivalence and containment problems for regular expressions and context-free grammars, concentrating on the relationship between complexity and various language properties. Finiteness and boundedness of languages are shown to play important roles in the complexity of these problems. An encoding into grammars of Turing machine computations exponential in the size of the grammar is used to prove several exponential lower bounds. These lower bounds include exponential time for testing equivalence of grammars generating finite sets, and exponential space for testing equivalence of non-self-embedding grammars. Several problems which might be complex because of this encoding are shown to simplify for linear grammars. Other problems considered include grammatical covering and structural equivalence for right-linear, linear, and arbitrary grammars
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Bidimensional Linear Recursive Sequences and Universality of Unambiguous Register Automata
We study the universality and inclusion problems for register automata over
equality data. We show that the universality and the inclusion problems can be
solved with 2-EXPTIME complexity when the input automata are without guessing
and unambiguous, improving on the currently best-known 2-EXPSPACE upper bound
by Mottet and Quaas. When the number of registers of both automata is fixed, we
obtain a lower EXPTIME complexity, also improving the EXPSPACE upper bound from
Mottet and Quaas for fixed number of registers. We reduce inclusion to
universality, and then we reduce universality to the problem of counting the
number of orbits of runs of the automaton. We show that the orbit-counting
function satisfies a system of bidimensional linear recursive equations with
polynomial coefficients (linrec), which generalises analogous recurrences for
the Stirling numbers of the second kind, and then we show that universality
reduces to the zeroness problem for linrec sequences. While such a counting
approach is classical and has successfully been applied to unambiguous finite
automata and grammars over finite alphabets, its application to register
automata over infinite alphabets is novel. We provide two algorithms to decide
the zeroness problem for bidimensional linear recursive sequences arising from
orbit-counting functions. Both algorithms rely on techniques from linear
non-commutative algebra. The first algorithm performs variable elimination and
has elementary complexity. The second algorithm is a refined version of the
first one and it relies on the computation of the Hermite normal form of
matrices over a skew polynomial field. The second algorithm yields an EXPTIME
decision procedure for the zeroness problem of linrec sequences, which in turn
yields the claimed bounds for the universality and inclusion problems of
register automata.Comment: full version of the homonymous paper to appear in the proceedings of
STACS'2
Security Applications of Formal Language Theory
We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development.
Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques.
This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis
The many facets of string transducers
Regular word transductions extend the robust notion of regular languages from a qualitative to a quantitative reasoning. They were already considered in early papers of formal language theory, but turned out to be much more challenging. The last decade brought considerable research around various transducer models, aiming to achieve similar robustness as for automata and languages. In this paper we survey some older and more recent results on string transducers. We present classical connections between automata, logic and algebra extended to transducers, some genuine definability questions, and review approaches to the equivalence problem
- …