43 research outputs found

    Complexity of Lexical Descriptions and its Relevance to Partial Parsing

    Get PDF
    In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1‰ on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83‰ of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application

    Graph Theory and Universal Grammar

    Get PDF
    Tese arquivada ao abrigo da Portaria nº 227/2017 de 25 de Julho-Registo de Grau EstrangeiroIn the last few years, Noam Chomsky (1994; 1995; 2000; 2001) has gone quite far in the direction of simplifying syntax, including eliminating X-bar theory and the levels of D-structure and S-structure entirely, as well as reducing movement rules to a combination of the more primitive operations of Copy and Merge. What remain in the Minimalist Program are the operations Merge and Agree and the levels of LF (Logical Form) and PF (Phonological form). My doctoral thesis attempts to offer an economical theory of syntactic structure from a graph-theoretic point of view (cf. Diestel, 2005), with special emphases on the elimination of category and projection labels and the Inclusiveness Condition (Chomsky 1994). The major influences for the development of such a theory have been Chris Collins’ (2002) seminal paper “Eliminating labels”, John Bowers (2001) unpublished manuscript “Syntactic Relations” and the Cartographic Paradigm (see Belletti, Cinque and Rizzi’s volumes on OUP for a starting point regarding this paradigm). A syntactic structure will be regarded here as a graph consisting of the set of lexical items, the set of relations among them and nothing more

    Extensible Dependency Grammar: a modular grammar formalism based on multigraph description

    Get PDF
    This thesis develops Extensible Dependency Grammar (XDG), a new grammar formalism combining dependency grammar, model-theoretic syntax, and Jackendoff\u27;s parallel grammar architecture. The design of XDG is strongly geared towards modularity: grammars can be modularly extended by any linguistic aspect such as grammatical functions, word order, predicate-argument structure, scope, information structure and prosody, where each aspect is modeled largely independently on a separate dimension. The intersective demands of the dimensions make many complex linguistic phenomena such as extraction in syntax, scope ambiguities in the semantics, and control and raising in the syntax-semantics interface simply fall out as by-products without further stipulation. This thesis makes three main contributions: 1. The first formalization of XDG as a multigraph description language in higher order logic, and investigations of its expressivity and computational complexity. 2. The first implementation of XDG, the XDG Development Kit (XDK), an extensive grammar development environment built around a constraint parser for XDG. 3. The first application of XDG to natural language, modularly modeling a fragment of English

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen für die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten für natürliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer Einführung in constraintbasierte Analyse natürlicher Sprache geben wir einen Überblick über typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir führen XML Standoff-Markup als zusätzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache für die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere Beiträge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und Unterstützung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, Kreativitätsunterstützung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage für semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform für weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert

    Grammatical theory: From transformational grammar to constraint-based approaches. Second revised and extended edition.

    Get PDF
    This book is superseded by the third edition, available at http://langsci-press.org/catalog/book/255. This book introduces formal grammar theories that play a role in current linguistic theorizing (Phrase Structure Grammar, Transformational Grammar/Government & Binding, Generalized Phrase Structure Grammar, Lexical Functional Grammar, Categorial Grammar, Head-​Driven Phrase Structure Grammar, Construction Grammar, Tree Adjoining Grammar). The key assumptions are explained and it is shown how the respective theory treats arguments and adjuncts, the active/passive alternation, local reorderings, verb placement, and fronting of constituents over long distances. The analyses are explained with German as the object language. The second part of the book compares these approaches with respect to their predictions regarding language acquisition and psycholinguistic plausibility. The nativism hypothesis, which assumes that humans posses genetically determined innate language-specific knowledge, is critically examined and alternative models of language acquisition are discussed. The second part then addresses controversial issues of current theory building such as the question of flat or binary branching structures being more appropriate, the question whether constructions should be treated on the phrasal or the lexical level, and the question whether abstract, non-visible entities should play a role in syntactic analyses. It is shown that the analyses suggested in the respective frameworks are often translatable into each other. The book closes with a chapter showing how properties common to all languages or to certain classes of languages can be captured. The book is a translation of the German book Grammatiktheorie, which was published by Stauffenburg in 2010. The following quotes are taken from reviews: With this critical yet fair reflection on various grammatical theories, MĂĽller fills what was a major gap in the literature. Karen Lehmann, Zeitschrift fĂĽr Rezen­sio­nen zur ger­man­is­tis­chen Sprach­wis­senschaft, 2012 Stefan MĂĽller’s recent introductory textbook, Gram­matik­the­o­rie, is an astonishingly comprehensive and insightful survey for beginning students of the present state of syntactic theory. Wolfgang Sternefeld und Frank Richter, Zeitschrift fĂĽr Sprach­wissen­schaft, 2012 This is the kind of work that has been sought after for a while [...] The impartial and objective discussion offered by the author is particularly refreshing. Werner Abraham, Germanistik, 2012   This book is a new edition of http://langsci-press.org/catalog/book/25

    Head-Driven Phrase Structure Grammar

    Get PDF
    Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

    Superseded: Grammatical theory: From transformational grammar to constraint-based approaches. Second revised and extended edition.

    Get PDF
    This book is superseded by the third edition, available at http://langsci-press.org/catalog/book/255. This book introduces formal grammar theories that play a role in current linguistic theorizing (Phrase Structure Grammar, Transformational Grammar/Government & Binding, Generalized Phrase Structure Grammar, Lexical Functional Grammar, Categorial Grammar, Head-​Driven Phrase Structure Grammar, Construction Grammar, Tree Adjoining Grammar). The key assumptions are explained and it is shown how the respective theory treats arguments and adjuncts, the active/passive alternation, local reorderings, verb placement, and fronting of constituents over long distances. The analyses are explained with German as the object language. The second part of the book compares these approaches with respect to their predictions regarding language acquisition and psycholinguistic plausibility. The nativism hypothesis, which assumes that humans posses genetically determined innate language-specific knowledge, is critically examined and alternative models of language acquisition are discussed. The second part then addresses controversial issues of current theory building such as the question of flat or binary branching structures being more appropriate, the question whether constructions should be treated on the phrasal or the lexical level, and the question whether abstract, non-visible entities should play a role in syntactic analyses. It is shown that the analyses suggested in the respective frameworks are often translatable into each other. The book closes with a chapter showing how properties common to all languages or to certain classes of languages can be captured. The book is a translation of the German book Grammatiktheorie, which was published by Stauffenburg in 2010. The following quotes are taken from reviews: With this critical yet fair reflection on various grammatical theories, MĂĽller fills what was a major gap in the literature. Karen Lehmann, Zeitschrift fĂĽr Rezen­sio­nen zur ger­man­is­tis­chen Sprach­wis­senschaft, 2012 Stefan MĂĽller’s recent introductory textbook, Gram­matik­the­o­rie, is an astonishingly comprehensive and insightful survey for beginning students of the present state of syntactic theory. Wolfgang Sternefeld und Frank Richter, Zeitschrift fĂĽr Sprach­wissen­schaft, 2012 This is the kind of work that has been sought after for a while [...] The impartial and objective discussion offered by the author is particularly refreshing. Werner Abraham, Germanistik, 2012   This book is a new edition of http://langsci-press.org/catalog/book/25
    corecore