4,161 research outputs found
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
Recommended from our members
Language acquisition and machine learning
In this paper, we review recent progress in the field of machine learning and examine its implications for computational models of language acquisition. As a framework for understanding this research, we propose four component tasks involved in learning from experience - aggregation, clustering, characterization, and storage. We then consider four common problems studied by machine learning researchers - learning from examples, heuristics learning, conceptual clustering, and learning macro-operators - describing each in terms of our framework. After this, we turn to the problem of grammar acquisition, relating this problem to other learning tasks and reviewing four AI systems that have addressed the problem. Finally, we note some limitations of the earlier work and propose an alternative approach to modeling the mechanisms underlying language acquisition
Macro tree transducers
Macro tree transducers are a combination of top-down tree transducers and macro grammars. They serve as a model for syntax-directed semantics in which context information can be handled. In this paper the formal model of macro tree transducers is studied by investigating typical automata theoretical topics like composition, decomposition, domains, and ranges of the induced translation classes. The extension with regular look-ahead is considered
Linear Bounded Composition of Tree-Walking Tree Transducers: Linear Size Increase and Complexity
Compositions of tree-walking tree transducers form a hierarchy with respect
to the number of transducers in the composition. As main technical result it is
proved that any such composition can be realized as a linear bounded
composition, which means that the sizes of the intermediate results can be
chosen to be at most linear in the size of the output tree. This has
consequences for the expressiveness and complexity of the translations in the
hierarchy. First, if the computed translation is a function of linear size
increase, i.e., the size of the output tree is at most linear in the size of
the input tree, then it can be realized by just one, deterministic,
tree-walking tree transducer. For compositions of deterministic transducers it
is decidable whether or not the translation is of linear size increase. Second,
every composition of deterministic transducers can be computed in deterministic
linear time on a RAM and in deterministic linear space on a Turing machine,
measured in the sum of the sizes of the input and output tree. Similarly, every
composition of nondeterministic transducers can be computed in simultaneous
polynomial time and linear space on a nondeterministic Turing machine. Their
output tree languages are deterministic context-sensitive, i.e., can be
recognized in deterministic linear space on a Turing machine. The membership
problem for compositions of nondeterministic translations is nondeterministic
polynomial time and deterministic linear space. The membership problem for the
composition of a nondeterministic and a deterministic tree-walking tree
translation (for a nondeterministic IO macro tree translation) is log-space
reducible to a context-free language, whereas the membership problem for the
composition of a deterministic and a nondeterministic tree-walking tree
translation (for a nondeterministic OI macro tree translation) is possibly
NP-complete
In the Maze of Data Languages
In data languages the positions of strings and trees carry a label from a
finite alphabet and a data value from an infinite alphabet. Extensions of
automata and logics over finite alphabets have been defined to recognize data
languages, both in the string and tree cases. In this paper we describe and
compare the complexity and expressiveness of such models to understand which
ones are better candidates as regular models
- …