437 research outputs found

    Mild context-sensitivity and tuple-based generalizations of context-free grammar

    Get PDF
    This paper classifies a family of grammar formalisms that extend context-free grammar by talking about tuples of terminal strings, rather than independently combining single terminal words into larger single phrases. These include a number of well-known formalisms, such as head grammar and linear context-free rewriting systems, but also a new formalism, (simple) literal movement grammar, which strictly extends the previously known formalisms, while preserving polynomial time recognizability. The descriptive capacity of simple literal movement grammars is illustrated both formally through a weak generative capacity argument and in a more practical sense by the description of conjunctive cross-serial relative clauses in Dutch. After sketching a complexity result and drawing a number of conclusions from the illustrations, it is then suggested that the notion of mild context-sensitivity currently in use, that depends on the rather loosely defined concept of constant growth, needs a modification to apply sensibly to the illustrated facts; an attempt at such a revision is proposed

    The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish

    Get PDF
    In this dissertation, I examine a language with “free” word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of “free” word order within a computational framework. In “free” word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the “information structure”, e.g. what is the “topic” and the “focus” of the sentence. The context-appropriate use of “free” word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation. I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of “free” word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in “free” word order languages

    Constraint-based computational semantics : a comparison between LTAG and LRS

    Get PDF
    This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG

    A Case Study of the Convergence of Mildly Context-Sensitive Formalisms for Natural Language Syntax: from Minimalist Grammars to Multiple Context-Free Grammars

    Get PDF
    Soumis en tant que rapport de recherche INRIA Futurs - Projet SIGNESThe present work is set in the field of natural language syntactic parsing. We present the concept of "mildly context-sensitive" grammar formalisms, which are full-fetched and efficient for syntactic parsing. We summarize a number of these formalisms' definitions, together with the relations between one another, and, most importantly, a survey of known equivalences. The conversion of Edward Stabler's Minimalist Grammars into Multiple Context-Free Grammars (MCFG) is presented in particular detail, along with a study of the complexity of this procedure and of its implications for parsing. This report is an adaptation of the French Master thesis that bears the same name, from Bordeaux 1 University, June 2006

    A Theory of Emergent In-Context Learning as Implicit Structure Induction

    Full text link
    Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models

    Incremental syntax generation with tree adjoining grammars

    Get PDF
    With the increasing capacity of AI systems the design of human--computer interfaces has become a favorite research topic in AI. In this paper we focus on aspects of the output of a computer. The architecture of a sentence generation component -- embedded in the WIP system -- is described. The main emphasis is laid on the motivation for the incremental style of processing and the encoding of adequate linguistic units as rules of a Lexicalized Tree Adjoining Grammar with Unification
    corecore