766 research outputs found
Linear Bounded Composition of Tree-Walking Tree Transducers: Linear Size Increase and Complexity
Compositions of tree-walking tree transducers form a hierarchy with respect
to the number of transducers in the composition. As main technical result it is
proved that any such composition can be realized as a linear bounded
composition, which means that the sizes of the intermediate results can be
chosen to be at most linear in the size of the output tree. This has
consequences for the expressiveness and complexity of the translations in the
hierarchy. First, if the computed translation is a function of linear size
increase, i.e., the size of the output tree is at most linear in the size of
the input tree, then it can be realized by just one, deterministic,
tree-walking tree transducer. For compositions of deterministic transducers it
is decidable whether or not the translation is of linear size increase. Second,
every composition of deterministic transducers can be computed in deterministic
linear time on a RAM and in deterministic linear space on a Turing machine,
measured in the sum of the sizes of the input and output tree. Similarly, every
composition of nondeterministic transducers can be computed in simultaneous
polynomial time and linear space on a nondeterministic Turing machine. Their
output tree languages are deterministic context-sensitive, i.e., can be
recognized in deterministic linear space on a Turing machine. The membership
problem for compositions of nondeterministic translations is nondeterministic
polynomial time and deterministic linear space. The membership problem for the
composition of a nondeterministic and a deterministic tree-walking tree
translation (for a nondeterministic IO macro tree translation) is log-space
reducible to a context-free language, whereas the membership problem for the
composition of a deterministic and a nondeterministic tree-walking tree
translation (for a nondeterministic OI macro tree translation) is possibly
NP-complete
Streaming Tree Transducers
Theory of tree transducers provides a foundation for understanding
expressiveness and complexity of analysis problems for specification languages
for transforming hierarchically structured data such as XML documents. We
introduce streaming tree transducers as an analyzable, executable, and
expressive model for transforming unranked ordered trees in a single pass.
Given a linear encoding of the input tree, the transducer makes a single
left-to-right pass through the input, and computes the output in linear time
using a finite-state control, a visibly pushdown stack, and a finite number of
variables that store output chunks that can be combined using the operations of
string-concatenation and tree-insertion. We prove that the expressiveness of
the model coincides with transductions definable using monadic second-order
logic (MSO). Existing models of tree transducers either cannot implement all
MSO-definable transformations, or require regular look ahead that prohibits
single-pass implementation. We show a variety of analysis problems such as
type-checking and checking functional equivalence are solvable for our model.Comment: 40 page
Programming Using Automata and Transducers
Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures
Multiple Context-Free Tree Grammars: Lexicalization and Characterization
Multiple (simple) context-free tree grammars are investigated, where "simple"
means "linear and nondeleting". Every multiple context-free tree grammar that
is finitely ambiguous can be lexicalized; i.e., it can be transformed into an
equivalent one (generating the same tree language) in which each rule of the
grammar contains a lexical symbol. Due to this transformation, the rank of the
nonterminals increases at most by 1, and the multiplicity (or fan-out) of the
grammar increases at most by the maximal rank of the lexical symbols; in
particular, the multiplicity does not increase when all lexical symbols have
rank 0. Multiple context-free tree grammars have the same tree generating power
as multi-component tree adjoining grammars (provided the latter can use a
root-marker). Moreover, every multi-component tree adjoining grammar that is
finitely ambiguous can be lexicalized. Multiple context-free tree grammars have
the same string generating power as multiple context-free (string) grammars and
polynomial time parsing algorithms. A tree language can be generated by a
multiple context-free tree grammar if and only if it is the image of a regular
tree language under a deterministic finite-copying macro tree transducer.
Multiple context-free tree grammars can be used as a synchronous translation
device.Comment: 78 pages, 13 figure
Survey : Weighted extended top-down tree transducers part I. : basics and expressive power
Weighted extended top-down tree transducers (transducteurs généralisés descendants [Arnold, Dauchet: Bi-transductions de forêts. ICALP'76. Edinburgh University Press, 1976]) received renewed interest in the field of Natural Language Processing, where they are used in syntax-based machine translation. This survey presents the foundations for a theoretical analysis of weighted extended top-down tree transducers. In particular, it discusses essentially complete semirings, which are a novel concept that can be used to lift incomparability results from the unweighted case to the weighted case even in the presence of infinite sums. In addition, several equivalent ways to define weighted extended top-down tree transducers are presented and the individual benefits of each presentation is shown on a small result
Composition closure of linear extended top-down tree transducers
Algorithms and the Foundations of Software technolog
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Linearly bounded infinite graphs
Linearly bounded Turing machines have been mainly studied as acceptors for
context-sensitive languages. We define a natural class of infinite automata
representing their observable computational behavior, called linearly bounded
graphs. These automata naturally accept the same languages as the linearly
bounded machines defining them. We present some of their structural properties
as well as alternative characterizations in terms of rewriting systems and
context-sensitive transductions. Finally, we compare these graphs to rational
graphs, which are another class of automata accepting the context-sensitive
languages, and prove that in the bounded-degree case, rational graphs are a
strict sub-class of linearly bounded graphs
- …