1,182 research outputs found

    Exploiting Lexical Conceptual Structure for paraphrase generation

    Get PDF
    Abstract. Lexical Conceptual Structure (LCS) represents verbs as semantic structures with a limited number of semantic predicates. This paper attempts to exploit how LCS can be used to explain the regularities underlying lexical and syntactic paraphrases, such as verb alternation, compound word decomposition, and lexical derivation. We propose a paraphrase generation model which transforms LCSs of verbs, and then conduct an empirical experiment taking the paraphrasing of Japanese light-verb constructions as an example. Experimental results justify that syntactic and semantic properties of verbs encoded in LCS are useful to semantically constrain the syntactic transformation in paraphrase generation.

    Modern and traditional descriptive approaches

    Get PDF

    Metaphoric Paraphrase Generation

    Full text link
    This work describes the task of metaphoric paraphrase generation, in which we are given a literal sentence and are charged with generating a metaphoric paraphrase. We propose two different models for this task: a lexical replacement baseline and a novel sequence to sequence model, 'metaphor masking', that generates free metaphoric paraphrases. We use crowdsourcing to evaluate our results, as well as developing an automatic metric for evaluating metaphoric paraphrases. We show that while the lexical replacement baseline is capable of producing accurate paraphrases, they often lack metaphoricity, while our metaphor masking model excels in generating metaphoric sentences while performing nearly as well with regard to fluency and paraphrase quality.Comment: 10 pages, 3 figure

    Linguistic-based computational treatment of textual entailment recognition

    Get PDF
    In this thesis, I investigate how lexical resources based on the organisation of lexical knowledge in classes which share common (syntactic, semantic, etc.) features support natural language processing and in particular symbolic recognition of textual entailment. First, I present a robust and wide coverage approach to lexico-structural verb paraphrase recognition based on Levin\u27s (1993) classification of English verbs. Then, I show that by extending Levin\u27s framework to general inference patterns, a classification of English adjectives can be obtained that compared with previous approaches, provides a more fine grained semantic characterisation of their inferential properties. Further, I develop a compositional semantic framework to assign a semantic representation to adjectives based on an ontologically promiscuous approach (Hobbs, 1985) and thereby supporting first order inference for all types of adjectives including extensional ones. Finally, I present a test suite for adjectival inference I developed as a resource for the evaluation of computational systems handling natural language inference.In der vorliegenden Dissertation habe ich untersucht, wie lexikalische Ressourcen, die auf der Gliederung lexikalischen Wissens in Klassen mit gemeinsamen Eigenschaften (lexikalische, semantische etc,) basieren, die computergestützte Verarbeitung natürlicher Sprache und insbesondere die symbolische Erkennung von Entailment unterstützen. Basierend auf Levins (1993) Klassifikation englischer Verben, wurde zuerst ein robuster, für die Verarbeitung beliebiger Texte geeigneter Ansatz zur Paraphrasenerkennung vorgestellt. Dann habe ich aufgezeigt, dass man durch eine Erweiterung von Levins Systematik zur Behandlung allgemeiner Inferenzmuster, eine Klassifikation von englischen Adjektiven erhält, die verglichen mit früheren Ansätzen, eine feinkörnige semantische Charakterisierung ihrer inferentiellen Eigenschaften gestattet und so die Basis für die computergestützte Behandlung von Inferenz bei Adjektiven bildet. Ein anderes beachtliches Ergebnis der vorliegenden Arbeit ist die Test Suite, die ich entwickelt habe und die als Ressource für NPL Anwendungen, die Inferenzen (insbesondere Inferenzen bei Adjektiven) behandeln, genutzt werden kann. Durch die Konstruktion dieser Test Suite beabsichtige ich, den Weg für die Schaffung von Ressourcen zu ebnen, die einen tieferen Einblick in die für Inferenz verantwortlichen Phänomene ermöglichen

    On the Syntax of Modal Verbs in Mandarin Chinese

    Get PDF
    Tohoku University博士(文学)博士学位論文 (Thesis(doctor))thesi

    Polysemy and homonymy in Japanese verbal alternations

    Get PDF

    Inflection and Derivation in a Second Language

    Get PDF

    D6.2 Integrated Final Version of the Components for Lexical Acquisition

    Get PDF
    The PANACEA project has addressed one of the most critical bottlenecks that threaten the development of technologies to support multilingualism in Europe, and to process the huge quantity of multilingual data produced annually. Any attempt at automated language processing, particularly Machine Translation (MT), depends on the availability of language-specific resources. Such Language Resources (LR) contain information about the language\u27s lexicon, i.e. the words of the language and the characteristics of their use. In Natural Language Processing (NLP), LRs contribute information about the syntactic and semantic behaviour of words - i.e. their grammar and their meaning - which inform downstream applications such as MT. To date, many LRs have been generated by hand, requiring significant manual labour from linguistic experts. However, proceeding manually, it is impossible to supply LRs for every possible pair of European languages, textual domain, and genre, which are needed by MT developers. Moreover, an LR for a given language can never be considered complete nor final because of the characteristics of natural language, which continually undergoes changes, especially spurred on by the emergence of new knowledge domains and new technologies. PANACEA has addressed this challenge by building a factory of LRs that progressively automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems. The existence of such a factory will significantly cut down the cost, time and human effort required to build LRs. WP6 has addressed the lexical acquisition component of the LR factory, that is, the techniques for automated extraction of key lexical information from texts, and the automatic collation of lexical information into LRs in a standardized format. The goal of WP6 has been to take existing techniques capable of acquiring syntactic and semantic information from corpus data, improving upon them, adapting and applying them to multiple languages, and turning them into powerful and flexible techniques capable of supporting massive applications. One focus for improving the scalability and portability of lexical acquisition techniques has been to extend exiting techniques with more powerful, less "supervised" methods. In NLP, the amount of supervision refers to the amount of manual annotation which must be applied to a text corpus before machine learning or other techniques are applied to the data to compile a lexicon. More manual annotation means more accurate training data, and thus a more accurate LR. However, given that it is impractical from a cost and time perspective to manually annotate the vast amounts of data required for multilingual MT across domains, it is important to develop techniques which can learn from corpora with less supervision. Less supervised methods are capable of supporting both large-scale acquisition and efficient domain adaptation, even in the domains where data is scarce. Another focus of lexical acquisition in PANACEA has been the need of LR users to tune the accuracy level of LRs. Some applications may require increased precision, or accuracy, where the application requires a high degree of confidence in the lexical information used. At other times a greater level of coverage may be required, with information about more words at the expense of some degree of accuracy. Lexical acquisition in PANACEA has investigated confidence thresholds for lexical acquisition to ensure that the ultimate users of LRs can generate lexical data from the PANACEA factory at the desired level of accuracy

    A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar

    Get PDF
    International audienceSurface realisers divide into those used in generation (NLG geared realisers) and those mirroring the parsing process (Reversible realisers). While the first rely on grammars not easily usable for parsing, it is unclear how the second type of realisers could be parameterised to yield from among the set of possible paraphrases, the paraphrase appropriate to a given generation context. In this paper, we present a surface realiser which combines a reversible grammar (used for parsing and doing semantic construction) with a symbolic means of selecting paraphrases
    corecore