2,557 research outputs found
Contextual bitext-derived paraphrases in automatic MT evaluation
In this paper we present a novel method for deriving paraphrases during automatic MT evaluation using only the source and reference texts, which are necessary for
the evaluation, and word and phrase alignment software. Using target language paraphrases produced through word and
phrase alignment a number of alternative reference sentences are constructed automatically for each candidate translation. The method produces lexical and lowlevel
syntactic paraphrases that are relevant to the domain in hand, does not use external knowledge resources, and can be
combined with a variety of automatic MT evaluation system
Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories
The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central
issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given
that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into
MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEORās synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics
Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus
Within the project Twenty-One, which aims at the effective dissemination of information on ecology and sustainable development, a sytem is developed that supports cross-language information retrieval in any of the four languages Dutch, English, French and German. Knowledge of this application domain is needed to enhance existing translation resources for the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomes possible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantages over algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Parallel corpora are a valuable resource for machine translation, but at
present their availability and utility is limited by genre- and
domain-specificity, licensing restrictions, and the basic difficulty of
locating parallel texts in all but the most dominant of the world's languages.
A parallel corpus resource not yet explored is the World Wide Web, which hosts
an abundance of pages in parallel translation, offering a potential solution to
some of these problems and unique opportunities of its own. This paper presents
the necessary first step in that exploration: a method for automatically
finding parallel translated documents on the Web. The technique is conceptually
simple, fully language independent, and scalable, and preliminary evaluation
results indicate that the method may be accurate enough to apply without human
intervention.Comment: LaTeX2e, 11 pages, 7 eps figures; uses psfig, llncs.cls, theapa.sty.
An Appendix at http://umiacs.umd.edu/~resnik/amta98/amta98_appendix.html
contains test dat
- ā¦