29 research outputs found

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

    Cracking-Resistant Password Vaults using Natural Language Encoders

    Get PDF
    Password vaults are increasingly popular applications that store multiple passwords encrypted under a single master password that the user memorizes. A password vault can greatly reduce the burden on a user of remembering passwords, but introduces a single point of failure. An attacker that obtains a user’s encrypted vault can mount offline brute-force attacks and, if successful, compromise all of the passwords in the vault. In this paper, we investigate the construction of encrypted vaults that resist such offline cracking attacks and force attackers instead to mount online attacks. Our contributions are as follows. We present an attack and supporting analysis showing that a previous design for cracking-resistant vaults—the only one of which we are aware—actually degrades security relative to conventional password-based approaches. We then introduce a new type of secure encoding scheme that we call a natural language encoder (NLE). An NLE permits the construction of vaults which, when decrypted with the wrong master password, produce plausible-looking decoy passwords. We show how to build NLEs using existing tools from natural language processing, such as n-gram models and probabilistic context-free grammars, and evaluate their ability to generate plausible decoys. Finally, we present, implement, and evaluate a full, NLE-based cracking-resistant vault system called NoCrack

    Data-Oriented Parsing with discontinuous constituents and function tags

    Get PDF
    Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses.  We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch

    Data-Oriented Parsing with Discontinuous Constituents and Function Tags

    Get PDF
    Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    Argumentative zoning information extraction from scientific text

    Get PDF
    Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope

    A Lexicalized Tree Adjoining Grammar for English

    Get PDF
    This paper presents a sizable grammar for English written in the Tree Adjoining grammar (TAG) formalism. The grammar uses a TAG that is both lexicalized (Schabes, Abeillé, Joshi 1988) and feature-based (Vijay-Shankar, Joshi 1988). In this paper, we describe a wide range of phenomena that it covers. A Lexicalized TAG (LTAG) is organized around a lexicon, which associates sets of elementary trees (instead of just simple categories) with the lexical items. A Lexicalized TAG consists of a finite set of trees associated with lexical items, and operations (adjunction and substitution) for composing the trees. A lexical item is called the anchor of its corresponding tree and directly determines both the tree\u27s structure and its syntactic features. In particular, the trees define the domain of locality over which constraints are specified and these constraints are local with respect to their anchor. In this paper, the basic tree structures of the English LTAG are described, along with some relevant features. The interaction between the morphological and the syntactic components of the lexicon is also explained. Next, the properties of the different tree structures are discussed. The use of S complements exclusively allows us to take full advantage of the treatment of unbounded dependencies originally presented in Joshi (1985) and Kroch and Joshi (1985). Structures for auxiliaries and raising-verbs which use adjunction trees are also discussed. We present a representation of prepositional complements that is based on extended elementary trees. This representation avoids the need for preposition incorporation in order to account for double wh-questions (preposition stranding and pied-piping) and the pseudo-passive. A treatment of light verb constructions is also given, similar to what Abeillé (1988c) has presented. Again, neither noun nor adjective incorporation is needed to handle double passives and to account for CNPC violations in these constructions. TAG\u27S extended domain of locality allows us to handle, within a single level of syntactic description, phenomena that in other frameworks require either dual analyses or reanalysis. In addition, following Abeillé and Schabes (1989), we describe how to deal with semantic non compositionality in verb-particle combinations, light verb constructions and idioms, without losing the internal syntactic composition of these structures. The last sections discuss current work on PRO, case, anaphora and negation, and outline future work on copula constructions and small clauses, optional arguments, adverb movement and the nature of syntactic rules in a lexicalized framework

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen fĂŒr die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten fĂŒr natĂŒrliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer EinfĂŒhrung in constraintbasierte Analyse natĂŒrlicher Sprache geben wir einen Überblick ĂŒber typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir fĂŒhren XML Standoff-Markup als zusĂ€tzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache fĂŒr die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere BeitrĂ€ge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und UnterstĂŒtzung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, KreativitĂ€tsunterstĂŒtzung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage fĂŒr semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform fĂŒr weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert
    corecore