Search CORE

437 research outputs found

Mild context-sensitivity and tuple-based generalizations of context-free grammar

Author: Groenink A.V. (Annius)
Publication venue: CWI
Publication date: 01/01/1996
Field of study

This paper classifies a family of grammar formalisms that extend context-free grammar by talking about tuples of terminal strings, rather than independently combining single terminal words into larger single phrases. These include a number of well-known formalisms, such as head grammar and linear context-free rewriting systems, but also a new formalism, (simple) literal movement grammar, which strictly extends the previously known formalisms, while preserving polynomial time recognizability. The descriptive capacity of simple literal movement grammars is illustrated both formally through a weak generative capacity argument and in a more practical sense by the description of conjunctive cross-serial relative clauses in Dutch. After sketching a complexity result and drawing a number of conclusions from the illustrations, it is then suggested that the notion of mild context-sensitivity currently in use, that depends on the rather loosely defined concept of constant growth, needs a modification to apply sensibly to the illustrated facts; an attempt at such a revision is proposed

CWI's Institutional Repository

The Computational Analysis of the Syntax and Interpretation of Free Word Order in Turkish

Author: Hoffman Beryl
Publication venue: ScholarlyCommons
Publication date: 01/01/1995
Field of study

In this dissertation, I examine a language with “free” word order, specifically Turkish, in order to develop a formalism that can capture the syntax and the context-dependent interpretation of “free” word order within a computational framework. In “free” word order languages, word order is used to convey distinctions in meaning that are not captured by traditional truth-conditional semantics. The word order indicates the “information structure”, e.g. what is the “topic” and the “focus” of the sentence. The context-appropriate use of “free” word order is of considerable importance in developing practical applications in natural language interpretation, generation, and machine translation. I develop a formalism called Multiset-CCG, an extension of Combinatory Categorial Grammars, CCGs, (Ades/Steedman 1982, Steedman 1985), and demonstrate its advantages in an implementation of a data-base query system that interprets Turkish questions and generates answers with contextually appropriate word orders. Multiset-CCG is a context-sensitive and polynomially parsable grammar that captures the formal and descriptive properties of “free” word order and restrictions on word order in simple and complex sentences (with discontinuous constituents and long distance dependencies). Multiset-CCG captures the context-dependent meaning of word order in Turkish by compositionally deriving the predicate-argument structure and the information structure of a sentence in parallel. The advantages of using such a formalism are that it is computationally attractive and that it provides a compositional and flexible surface structure that allows syntactic constituents to correspond to information structure constituents. A formalism that integrates information structure and syntax such as Multiset-CCG is essential to the computational tasks of interpreting and generating sentences with contextually appropriate word orders in “free” word order languages

ScholarlyCommons@Penn

Constraint-based computational semantics : a comparison between LTAG and LRS

Author: Kallmeyer Laura
Richter Frank
Publication venue
Publication date: 01/01/2006
Field of study

This paper compares two approaches to computational semantics, namely semantic unification in Lexicalized Tree Adjoining Grammars (LTAG) and Lexical Resource Semantics (LRS) in HPSG. There are striking similarities between the frameworks that make them comparable in many respects. We will exemplify the differences and similarities by looking at several phenomena. We will show, first of all, that many intuitions about the mechanisms of semantic computations can be implemented in similar ways in both frameworks. Secondly, we will identify some aspects in which the frameworks intrinsically differ due to more general differences between the approaches to formal grammar adopted by LTAG and HPSG

Hochschulschriftenserver - Universität Frankfurt am Main

A Case Study of the Convergence of Mildly Context-Sensitive Formalisms for Natural Language Syntax: from Minimalist Grammars to Multiple Context-Free Grammars

Author: Amblard Maxime
Durand Irène
Mery Bruno
Retoré Christian
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

Soumis en tant que rapport de recherche INRIA Futurs - Projet SIGNESThe present work is set in the field of natural language syntactic parsing. We present the concept of "mildly context-sensitive" grammar formalisms, which are full-fetched and efficient for syntactic parsing. We summarize a number of these formalisms' definitions, together with the relations between one another, and, most importantly, a survey of known equivalences. The conversion of Edward Stabler's Minimalist Grammars into Multiple Context-Free Grammars (MCFG) is presented in particular detail, along with a study of the complexity of this procedure and of its implications for parsing. This report is an adaptation of the French Master thesis that bears the same name, from Bordeaux 1 University, June 2006

CiteSeerX

INRIA a CCSD electronic archive server

A Theory of Emergent In-Context Learning as Implicit Structure Induction

Author: Goyal Navin
Hahn Michael
Publication venue
Publication date: 14/03/2023
Field of study

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models

arXiv.org e-Print Archive

Incremental syntax generation with tree adjoining grammars

Author: Finkler Wolfgang
Harbusch Karin
Schauder Anne
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1991
Field of study

With the increasing capacity of AI systems the design of human--computer interfaces has become a favorite research topic in AI. In this paper we focus on aspects of the output of a computer. The architecture of a sentence generation component -- embedded in the WIP system -- is described. The main emphasis is laid on the motivation for the incremental style of processing and the encoding of adequate linguistic units as rules of a Lexicalized Tree Adjoining Grammar with Unification

Universaar

Acronym