121 research outputs found
On the Complexity of Free Word Orders
International audienceWe propose some extensions of mildly context-sensitive for- malisms whose aim is to model free word orders in natural languages. We give a detailed analysis of the complexity of the formalisms we propose
Multiple Context-Free Tree Grammars: Lexicalization and Characterization
Multiple (simple) context-free tree grammars are investigated, where "simple"
means "linear and nondeleting". Every multiple context-free tree grammar that
is finitely ambiguous can be lexicalized; i.e., it can be transformed into an
equivalent one (generating the same tree language) in which each rule of the
grammar contains a lexical symbol. Due to this transformation, the rank of the
nonterminals increases at most by 1, and the multiplicity (or fan-out) of the
grammar increases at most by the maximal rank of the lexical symbols; in
particular, the multiplicity does not increase when all lexical symbols have
rank 0. Multiple context-free tree grammars have the same tree generating power
as multi-component tree adjoining grammars (provided the latter can use a
root-marker). Moreover, every multi-component tree adjoining grammar that is
finitely ambiguous can be lexicalized. Multiple context-free tree grammars have
the same string generating power as multiple context-free (string) grammars and
polynomial time parsing algorithms. A tree language can be generated by a
multiple context-free tree grammar if and only if it is the image of a regular
tree language under a deterministic finite-copying macro tree transducer.
Multiple context-free tree grammars can be used as a synchronous translation
device.Comment: 78 pages, 13 figure
Children as Models for Computers: Natural Language Acquisition for Machine Learning
International audienceThis paper focuses on a subïŹeld of machine learning, the so- called grammatical inference. Roughly speaking, grammatical inference deals with the problem of inferring a grammar that generates a given set of sample sentences in some manner that is supposed to be realized by some inference algorithm. We discuss how the analysis and formalization of the main features of the process of human natural language acquisition may improve results in the area of grammatical inference
Learning categorial grammars
In 1967 E. M. Gold published a paper in which the language classes from the Chomsky-hierarchy were analyzed in terms of learnability, in the technical sense of identification in the limit. His results were mostly negative, and perhaps because of this his work had little impact on linguistics.
In the early eighties there was renewed interest in the paradigm, mainly because of work by Angluin and Wright. Around the same time, Arikawa and his co-workers refined the paradigm by applying it to so-called Elementary Formal Systems. By making use of this approach Takeshi Shinohara was able to come up with an impressive result; any class of context-sensitive grammars with a bound on its number of rules is learnable.
Some linguistically motivated work on learnability also appeared from this point on, most notably Wexler & Culicover 1980 and Kanazawa 1994. The latter investigates the learnability of various classes of categorial grammar, inspired by work by Buszkowski and Penn, and raises some interesting questions.
We follow up on this work by exploring complexity issues relevant to learning these classes, answering an open question from Kanazawa 1994, and applying the same kind of approach to obtain (non)learnable classes of Combinatory Categorial Grammars, Tree Adjoining Grammars, Minimalist grammars, Generalized Quantifiers, and some variants of Lambek Grammars. We also discuss work on learning tree languages and its application to learning Dependency Grammars.
Our main conclusions are:
- formal learning theory is relevant to linguistics,
- identification in the limit is feasible for non-trivial classes,
- the `Shinohara approach' -i.e., placing a numerical bound on the complexity of a grammar- can lead to a learnable class, but this completely depends on the specific nature of the formalism and the notion of complexity. We give examples of natural classes of commonly used linguistic formalisms that resist this kind of approach,
- learning is hard work. Our results indicate that learning even `simple' classes of languages requires a lot of computational effort,
- dealing with structure (derivation-, dependency-) languages instead of string languages offers a useful and promising approach to learnabilty in a linguistic contex
Lexicalized non-local MCTAG with dominance links is NP-complete
An NP-hardness proof for non-local Multicomponent Tree Adjoining Grammar
(MCTAG) by Rambow and Satta (1st International Workshop on Tree
Adjoining Grammers 1992), based on Dahlhaus and Warmuth (in J Comput
Syst Sci 33:456â472, 1986), is extended to some linguistically
relevant restrictions of that formalism. It is found that there are
NP-hard grammars among non-local MCTAGs even if any or all of the
following restrictions are imposed: (i) lexicalization: every tree in
the grammar contains a terminal; (ii) dominance links: every tree set
contains at most two trees, and in every such tree set, there is a link
between the foot node of one tree and the root node of the other tree,
indicating that the former node must dominate the latter in the derived
tree. This is the version of MCTAG proposed in Becker et al.
(Proceedings of the 5th conference of the European chapter of the
Association for Computational Linguistics 1991) to account for German
long-distance scrambling. This result restricts the field of possible
candidates for an extension of Tree Adjoining Grammar that would be both
mildly context-sensitive and linguistically adequate
The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish
In this dissertation, I examine a language with âfreeâ word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of âfreeâ word order within a computational framework. In âfreeâ word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the âinformation structureâ, e.g. what is the âtopicâ and the âfocusâ of the sentence. The context-appropriate use of âfreeâ word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation.
I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of âfreeâ word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in âfreeâ word order languages
- âŠ