11,255 research outputs found
Examining the Tip of the Iceberg: A Data Set for Idiom Translation
Neural Machine Translation (NMT) has been widely used in recent years with
significant improvements for many language pairs. Although state-of-the-art NMT
systems are generating progressively better translations, idiom translation
remains one of the open challenges in this field. Idioms, a category of
multiword expressions, are an interesting language phenomenon where the overall
meaning of the expression cannot be composed from the meanings of its parts. A
first important challenge is the lack of dedicated data sets for learning and
evaluating idiom translation. In this paper we address this problem by creating
the first large-scale data set for idiom translation. Our data set is
automatically extracted from a widely used German-English translation corpus
and includes, for each language direction, a targeted evaluation set where all
sentences contain idioms and a regular training corpus where sentences
including idioms are marked. We release this data set and use it to perform
preliminary NMT experiments as the first step towards better idiom translation.Comment: Accepted at LREC 201
Statistical Augmentation of a Chinese Machine-Readable Dictionary
We describe a method of using statistically-collected Chinese character
groups from a corpus to augment a Chinese dictionary. The method is
particularly useful for extracting domain-specific and regional words not
readily available in machine-readable dictionaries. Output was evaluated both
using human evaluators and against a previously available dictionary. We also
evaluated performance improvement in automatic Chinese tokenization. Results
show that our method outputs legitimate words, acronymic constructions, idioms,
names and titles, as well as technical compounds, many of which were lacking
from the original dictionary.Comment: 17 pages, uuencoded compressed PostScrip
Figures of speech : figurative expressions and the management of topic transition in conversation
In conversation, speakers occasionally use figurative expressions such as “had a good innings,” “take with a pinch of salt,” or “come to the end of her tether.” This article investigates WHERE in conversation such expressions are used, in terms of their sequential distribution. One clear distributional pattern is found: Figurative expressions occur regularly in topic transition sequences, and specifically in the turn where a topic is summarized, thereby initiating the closing of a topic. The paper discusses some of the distinctive features of the topic termination/transition sequences with which figurative closings are associated, particularly participants' orientation to their moving to new topics. Finally, the interactional use of figurative expressions is considered in the context of instances where their use fails to secure topical closure, manifesting some conflict (disaffiliation, etc.) between the participants
Optimality Theory as a Framework for Lexical Acquisition
This paper re-investigates a lexical acquisition system initially developed
for French.We show that, interestingly, the architecture of the system
reproduces and implements the main components of Optimality Theory. However, we
formulate the hypothesis that some of its limitations are mainly due to a poor
representation of the constraints used. Finally, we show how a better
representation of the constraints used would yield better results
- …