11,255 research outputs found

    Examining the Tip of the Iceberg: A Data Set for Idiom Translation

    Get PDF
    Neural Machine Translation (NMT) has been widely used in recent years with significant improvements for many language pairs. Although state-of-the-art NMT systems are generating progressively better translations, idiom translation remains one of the open challenges in this field. Idioms, a category of multiword expressions, are an interesting language phenomenon where the overall meaning of the expression cannot be composed from the meanings of its parts. A first important challenge is the lack of dedicated data sets for learning and evaluating idiom translation. In this paper we address this problem by creating the first large-scale data set for idiom translation. Our data set is automatically extracted from a widely used German-English translation corpus and includes, for each language direction, a targeted evaluation set where all sentences contain idioms and a regular training corpus where sentences including idioms are marked. We release this data set and use it to perform preliminary NMT experiments as the first step towards better idiom translation.Comment: Accepted at LREC 201

    Statistical Augmentation of a Chinese Machine-Readable Dictionary

    Get PDF
    We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.Comment: 17 pages, uuencoded compressed PostScrip

    Figures of speech : figurative expressions and the management of topic transition in conversation

    Get PDF
    In conversation, speakers occasionally use figurative expressions such as “had a good innings,” “take with a pinch of salt,” or “come to the end of her tether.” This article investigates WHERE in conversation such expressions are used, in terms of their sequential distribution. One clear distributional pattern is found: Figurative expressions occur regularly in topic transition sequences, and specifically in the turn where a topic is summarized, thereby initiating the closing of a topic. The paper discusses some of the distinctive features of the topic termination/transition sequences with which figurative closings are associated, particularly participants' orientation to their moving to new topics. Finally, the interactional use of figurative expressions is considered in the context of instances where their use fails to secure topical closure, manifesting some conflict (disaffiliation, etc.) between the participants

    Optimality Theory as a Framework for Lexical Acquisition

    Full text link
    This paper re-investigates a lexical acquisition system initially developed for French.We show that, interestingly, the architecture of the system reproduces and implements the main components of Optimality Theory. However, we formulate the hypothesis that some of its limitations are mainly due to a poor representation of the constraints used. Finally, we show how a better representation of the constraints used would yield better results
    • …
    corecore