161 research outputs found

    Generation and synchronous tree-adjoining grammars

    Get PDF
    Tree-adjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations. We demonstrate that this intuition can be made concrete by using the formalism of synchronous tree-adjoining grammars. The use of synchronous TAGs for generation provides solutions to several problems with previous approaches to TAG generation. Furthermore, the semantic monotonicity requirement previously advocated for generation grammars as a computational aid is seen to be an inherent property of synchronous TAGs.Engineering and Applied Science

    Anaphora and Discourse Structure

    Full text link
    We argue in this paper that many common adverbial phrases generally taken to signal a discourse relation between syntactically connected units within discourse structure, instead work anaphorically to contribute relational meaning, with only indirect dependence on discourse structure. This allows a simpler discourse structure to provide scaffolding for compositional semantics, and reveals multiple ways in which the relational meaning conveyed by adverbial connectives can interact with that associated with discourse structure. We conclude by sketching out a lexicalised grammar for discourse that facilitates discourse interpretation as a product of compositional rules, anaphor resolution and inference.Comment: 45 pages, 17 figures. Revised resubmission to Computational Linguistic

    Introduction to the Special Issue on Natural Language and Learning Machines

    Get PDF
    1. Introduction The interaction between machine learning and natural language processing (NLP) research underlies most of the progress made in NLP for the last few decades (Cardie and Mooney 1999; Fung and Roth 2005). Machine Learning has been the common framework for the birth and development of most paradigms, discoveries and achievements in statistical natural language processing. At the international level the AAAI Fall symposiums in 1990 (Jacobs 1990) and 1992 (Goldman 1992) and the IBM ..

    GenLeNa: Sistema para la construcción de Aplicaciones de Generación de Lenguaje Natural

    Get PDF
    In this article the proposal is made for the division of the process of construction of natural language generation (NLG) systems into two stages: content planning (CP), which is dependent on the mastery of the application to be developed, and document structuring (DS). This division allows people who are not expert in NLG to develop natural language generation systems, concentrating on building abstract representations of the information to be communicated (called messages). Specific architecture for the DS stage is also presented. This enables NLG researchers to work ortogonally on specific techniques and methodologies for the conversion of messages into text which is grammatically and syntactically correct.En este artículo se propone la división del proceso de construcción de sistemas de Generación de Lenguajes Natural (GLN) en dos etapas: planificación del contenido (EPC), que es dependiente del dominio de la aplicación a desarrollar, y estructuración del documento (EED). Esta división permite que personas no expertas en GLN puedan desarrollar sistemas de generación de lenguajes natural enfocándose en construir representaciones abstractas de la información que se desea comunicar (denominadas mensajes). Adicionalmente se presenta una arquitectura específica para la etapa EED que permite a investigadores en GLN trabajar ortogonalmente en técnicas y metodologías específicas para la transformación de los mensajes en texto gramatical y sintácticamente correcto

    XMG : eXtensible MetaGrammar

    Get PDF
    International audienceIn this article, we introduce eXtensible MetaGrammar (xmg), a framework for specifying tree-based grammars such as Feature-Based Lexicalised Tree-Adjoining Grammars (FB-LTAG) and Interaction Grammars (IG). We argue that xmg displays three features which facilitate both grammar writing and a fast prototyping of tree-based grammars. Firstly, \xmg\ is fully declarative. For instance, it permits a declarative treatment of diathesis that markedly departs from the procedural lexical rules often used to specify tree-based grammars. Secondly, the \xmg\ language has a high notational expressivity in that it supports multiple linguistic dimensions, inheritance and a sophisticated treatment of identifiers. Thirdly, xmg is extensible in that its computational architecture facilitates the extension to other linguistic formalisms. We explain how this architecture naturally supports the design of three linguistic formalisms namely, FB-LTAG, IG, and Multi-Component Tree-Adjoining Grammar (MC-TAG). We further show how it permits a straightforward integration of additional mechanisms such as linguistic and formal principles. To further illustrate the declarativity, notational expressivity and extensibility of \xmg , we describe the methodology used to specify an FB-LTAG for French augmented with a unification-based compositional semantics. This illustrates both how xmg facilitates the modelling of the tree fragment hierarchies required to specify tree-based grammars and of a syntax/semantics interface between semantic representations and syntactic trees. Finally, we briefly report on several grammars for French, English and German that were implemented using \xmg\ and compare \xmg\ to other existing grammar specification frameworks for tree-based grammars

    Resource Generation from Structured Documents for Low-density Languages

    Get PDF
    The availability and use of electronic resources for both manual and automated language related processing has increased tremendously in recent years. Nevertheless, many resources still exist only in printed form, restricting their availability and use. This especially holds true in low density languages or languages with limited electronic resources. For these documents, automated conversion into electronic resources is highly desirable. This thesis focuses on the semi-automated conversion of printed structured documents (dictionaries in particular) to usable electronic representations. In the first part we present an entry tagging system that recognizes, parses, and tags the entries of a printed dictionary to reproduce the representation. The system uses the consistent layout and structure of the dictionaries, and the features that impose this structure, to capture and recover lexicographic information. We accomplish this by adapting two methods: rule-based and HMM-based. The system is designed to produce results quickly with minimal human assistance and reasonable accuracy. The use of an adaptive transformation-based learning as a post-processor at two points in the system yields significant improvements, even with an extremely small amount of user provided training data. The second part of this thesis presents Morphology Induction from Noisy Data (MIND), a natural language morphology discovery framework that operates on information from limited, noisy data obtained from the conversion process. To use the resulting resources effectively, however, users must be able to search for them using the root form of morphologically deformed variant found in the text. Stemming and data driven methods are not suitable when data are sparse. The approach is based on the novel application of string searching algorithms. The evaluations show that MIND can segment words into roots and affixes from the noisy, limited data contained in a dictionary, and it can extract prefixes, suffixes, circumfixes, and infixes. MIND can also identify morphophonemic changes, i.e., phonemic variations between allomorphs of a morpheme, specifically point-of-affixation stem changes. This, in turn, allows non-native speakers to perform multilingual tasks for applications where response must be rapid, and they have limited knowledge. In addition, this analysis can feed other natural language processing tools requiring lexicons

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
    corecore