Skip to main content
Article thumbnail
Location of Repository

Corpus-based machine translation evaluation via automated error detection in output texts

By Debra Elliott


Since the emergence of the first fully automatic machine translation (MT) systems over fifty years ago, the use of MT has increased dramatically. Consequently, the evaluation\ud of MT systems is crucial for all stakeholders. However, the human evaluation of MT output is expensive and time-consuming, often relying on subjective quality judgements\ud and requiring human `reference translations' against which the output is compared. As a result, interest in more recent years has turned towards automated evaluation methods, which aim to produce scores that reflect human quality judgements.\ud \ud As the majority of published automated evaluation methods still require human `reference translations' for comparison, the goal of this research is to investigate the\ud potential of a method that requires access only to the translation. Based on detailed corpus analyses, the primary aim is to devise methods for the automated detection of\ud particular error types in French-English MT output from competing systems and to explore correlations between automated error counts and human judgements of a translation as a whole.\ud \ud First, a French-English corpus designed specifically for MT evaluation was compiled. A sample of MT output from the corpus was then evaluated by humans to provide judgements against which automated scores would ultimately be compared. A datadriven fluency error classification scheme was subsequently developed to enable the consistent manual annotation of errors found in the English MT output, without access to the original French text. These annotations were then used to guide the selection of\ud error categories for automated error detection, and to facilitate the analysis of particular error types in context so that appropriate methods could be devised. Manual annotations were further used to evaluate the accuracy of each automated approach. Finally, error\ud detection algorithms were tested on English MT output from German, Italian and Spanish to determine the extent to which methods would need to be adapted for use with other language pairs

Publisher: School of Computing (Leeds)
Year: 2006
OAI identifier:

Suggested articles


  1. (1993). (Bi-text anglais-francais) Available from: http: //rali. iro, umontreal. ca!
  2. (1987). A centering approach to pronouns. doi
  3. (2000). A comparative evaluation of modern English corpus grammatical annotation schemes.
  4. (1993). A Diagnostic Tool for German Syntax. doi
  5. (2003). A fine-grained evaluation framework for machine translation system development.
  6. (1993). A First-Pass Approach for Evaluating Machine Translation Systems. doi
  7. (2002). A hands-on study of the reliability and cohcrcnce of evaluation metrics.
  8. (2004). A learning approach to improving sentence-level MT evaluation.
  9. (2003). A machine learning approach to pronoun resolution in spoken dialogue. doi
  10. (2001). A machine learning approach to the automatic evaluation of machine translation. doi
  11. (2001). A methodology for a semi-automatic evaluation of the lexicons of machine translation systems. doi
  12. (1977). A model for translation quality assessment.
  13. (1992). A new quantitative quality measure for machine translation systems. doi
  14. A Procedure for the Evaluation and Improvement of an MT System by the End-User. doi
  15. (2001). A program for automatically selecting the best output from multiple machine translation engines.
  16. (2003). A projection extension algorithm for statistical machine translation. doi
  17. (1994). A Simple and Practical Method for Evaluating Machine Translation Quality. doi
  18. (1998). A task-oriented evaluation metric for machine translation.
  19. (2001). A test suite for evaluation of English to Korean machine translation systems.
  20. (2001). An automatic evaluation method for machine translation using two-way MT.
  21. (2000). An automatic evaluation method of translation capability by DP matching using similar expressions queried from a parallel corpus. In
  22. (2000). An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research.
  23. (2003). An experiment in comparative evaluation: humans vs. computers.
  24. (1966). An experiment in evaluating the quality of translations. In
  25. (1998). An Introduction to Corpus Linguistics. doi
  26. (1992). An introduction to machine translation.
  27. (1994). Analysis of existing test suites. Report to LRE 62-089 (D"WPI) Test Suites for Natural Language Processing (TSNLP), University of Essex. Available from: httn: //d-www,
  28. (2001). Ape: Reducing the monkey business in post-editing by automating the task intelligently.
  29. (1994). Automated postediting of documents.
  30. (1996). Automatic evaluation of computer generated text: Final Report on the TextEval Project. Internal Report. Human Communications Research Centre, doi
  31. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. doi
  32. (1991). Automatic evaluation of translation quality : outline of methodology and report on pilot experiment.
  33. (2002). Automatic Ranking of MT Systems.
  34. (2001). Automatically predicting MT systems rankings compatible with Fluency, Adequacy or Informativeness scores.
  35. (2005). Available at : htty: //proiects.
  36. (1984). Available from: bttp: //www.
  37. (1990). Available from: http: //world.
  38. (1965). Binary codes capable of correcting deletions, insertions and reversals. doi
  39. (2001). Bleu: a Method for Automatic Evaluation of Machine Translation. Research Report RC22176. IBM Research Division. doi
  40. (2001). BNC database and word frequency lists. Available from: http: //www.
  41. (1953). Cloze procedure: a new tool for measuring readability.
  42. (2003). Commercial systems: the state of the art. In Somers, doi
  43. (2005). Compendium of Translation Software, Eleventh Edition,
  44. (2004). Compiling and using a shareable parallel corpus for machine translation evaluation.
  45. (2002). Computer-aided specification of quality models for machine translation evaluation. doi
  46. (2003). Computers and translation: a translator's guide, doi
  47. (1993). Constructive Machine Translation Evaluation. doi
  48. (1991). Corpus creation. Corpus, concordance, collocation, doi
  49. (1996). Corpus-based annotated test set for machine translation evaluation by an industrial user. doi
  50. (2002). Corpus-based comprehensive and diagnostic MT evaluation: Initial Arabic, Chinese, French, and Spanish results. doi
  51. (1990). Corpus: Available at: http: //www.
  52. (2003). Correlating automated and human assessments of machine translation quality.
  53. (1979). Critical Methods for Evaluating the Quality of Machine Translation. Prepared for the
  54. (2005). Current commercial machine translation systems and computer-bascd translation tools: system types and their uses.
  55. (2004). Designing and developing a corpus of contemporary Arabic. doi
  56. (2000). Determining the tolerance of text-handling tasks for MT output. doi
  57. (1998). DiET in the context of MT evaluation.
  58. (1997). Eliminating the language barrier on the web. A new language translation service for web content now available on Digital's AltaVista Search Service. White Paper.
  59. (1999). End-to-end evaluation in ATR-MATRIX: speech translation system between English and Japanese.
  60. (2000). English-French Aligned texts. Available from Knut Holland,
  61. (2005). Estimating the predictive power of n-gram MT evaluation metrics across languages and text types.
  62. (2000). European Association for Machine Translation (EAMT): Available at: http: //www. eamt. org/ EC1/MCI 1, European Corpus Initiative Multilingual Corpus Available from: http: //www.
  63. (1989). Evaluating discourse processing algorithms. doi
  64. (2000). Evaluating machine translation: the cloze procedure revisited.
  65. (2000). Evaluating translation quality as input to product development.
  66. (2003). Evaluation of machine translation and its evaluation.
  67. (1965). Evaluation of Machine Translation.
  68. (1993). Evaluation of MT systems at Union Bank of Switzerland. doi
  69. (1996). Evaluation Working Group.
  70. (1993). Evaluation: An Assessment. doi
  71. (2003). Experimental comparison of MT evaluation methods: RED vs. BLEU.
  72. (2004). Extending MT evaluation tools with translation complexity metrics. doi
  73. (2004). Extending the BLEU MT evaluation method with frequency weightings. doi
  74. (2003). FEMTI: creating and using a framework for MT evaluation.
  75. (1993). Forum: MT User Experience,
  76. (2005). FreeTranslation by SDL Available from httr: //www. freetransiation. com
  77. (2003). Going live on the internet. doi
  78. (1993). Good applications for crummy machinc translation. doi
  79. (2003). Google Translate Available from: http: //www. google. com/translate_t
  80. (2003). Granularity in MT evaluation.
  81. (2003). How to evaluate machine translation. doi
  82. (2002). Improving machine learning approaches to coreference resolution. doi
  83. (2001). In one hundred words or less.
  84. Internet des Professionnels. Available from: http: //www. indexa. fr/
  85. (2004). Investigation of intelligibility judgments. In doi
  86. (1995). JEIDA's test-sets for quality evaluation of MT systems - Technical evaluation from the developer's point of view.
  87. (1996). La Traduction 4uloniatique. Presses Univcrsitaircs du Septentrion,
  88. (1966). Language and machines: Computers in Translation and Linguistics. A Report by the Automatic Language Processing Advisory Committee (ALPAC).
  89. (2005). Language Weaver Increases Enterprise Scalability for its Statistical Machine Translation Software, Version 3.1 Release. Available from:
  90. (2002). Learning a translation lexicon from monolingual corpora. doi
  91. (2004). Learning noun phrase anaphoricity to improve coreference resolution: Issues in representation and optimization. doi
  92. (2004). Machine Translatability and Post-Editing Effort: How do they relate? doi
  93. (2004). Machine Translation of Online Product Support Articles Using a Data-Driven MT System. In doi
  94. (2002). Machine translation today and tomorrow. In Computerlinguistik: was geht, was kommt? Festschrift für Winfried Lenders, hrsg. Gerd Willee, Bernhard Schröder, Hans-Christian Schmitz, Sankt Augustin: Gardezi Verlag,
  95. (1995). Machine Translation: A Brief History. In doi
  96. (1994). Machine Translation: An Introductory Guide. Available from: bttp: //clwww.
  97. (1988). Machine Translation: Linguistic characteristics of MT systems and general methodology of evaluation. doi
  98. (1986). Machine translation: past, present, future. doi
  99. Machines: Computers in Translation and Linguistics. Report by the Automatic Language Processing
  100. (2003). Minimum error rate training in statistical machine translation. doi
  101. (1999). Mining the web for bilingual text. doi
  102. (2004). Modelling legitimate translation variation for automatic evaluation of MT quality.
  103. (1992). MT Contrasts between the US and Europe. In
  104. (2001). Natural Language Toolkit (NLTK). Available at: http: //nl tk. sourceforn et/
  105. (1998). Never look back: An alternative to centering. doi
  106. (2002). NLTK: The Natural Language Toolkit. doi
  107. (1996). On the notion of validity and the evaluation of MT systems. doi
  108. (2004). On-Line Dictionary of Computing. Available from: http:
  109. (2004). ORANGE: a method for evaluating automatic evaluation metrics for machine translation. doi
  110. (2003). Overcoming the MT Quality Impasse. Systran Language Translation Technologies: Report #29917.
  111. (1998). Parallel Text Corpus. Available from: http: //morph.
  112. (2004). Perspectives from Commercial Translation.
  113. (2002). Phrasebased statistical machine translation. doi
  114. (1983). Pour une typologie dcs errcurs dans la traduction automatiquc. doi
  115. (2001). Predicting intelligibility from fidelity in MT evaluation.
  116. (2001). Predicting MT fidelity from noun-compound handling.
  117. (1998). Predicting what MT is good for: user judgments and task performance. doi
  118. (2003). Principles of context-based machine translation evaluation.
  119. (1983). Quality and standards - the evaluation of translations.
  120. (2003). Rationale for a multilingual corpus for machine translation evaluation.
  121. (2001). Repairing texts: Empirical investigations of machine translation postediting processes. Edited by doi
  122. (1978). Resolving pronoun references. doi
  123. (1999). Review Article: Example-based Machine Translation. doi
  124. (2002). Scaling the ISLE Framework: Use of existing corpus resources for validation of MT evaluation metrics across languages.
  125. (2001). Scaling the ISLE Framework: Validating tests of machine translation quality for multi-dimensional measurement.
  126. (2001). Scaling the ISLE Taxonomy: Development of metrics for the multi-dimensional characterisation of machine translation quality.
  127. (1999). Scoring multiple translations using character ngram.
  128. (2001). Semi-automatic evaluation of the grammatical coverage of machine translation systems. doi
  129. (1991). Some practical experience with the use of test suites for the evaluation of Systran.
  130. (2000). Speech and language processing: an Introduction to natural language processing, computational linguistics, and speech recognition. doi
  131. (2002). Statistical Machine Translation: From Single-Word Models to Alignment Templates. doi
  132. (2003). Statistical phrase based translation. doi
  133. (2004). Statistical significance tests for machine translation evaluation. doi
  134. (1992). Systran: It obviously works, but how much can it be improved? In doi
  135. (1999). Task-based Evaluation for Machine Translation.
  136. (1999). TELRI: Trans European Language Resources Infrastructure: TRACTOR archive. Available from : http: //www.
  137. (1994). Test suites: some issues in their use and design. Test Suites for Natural Language Processing (TSNLP), University of Essex. Available
  138. (2005). The doi
  139. (1994). The ARPA MT evaluation methodologies: Evolution, lessons and further approaches.
  140. (1994). The ARPA MT evaluation methodologies: evolution, lessons, and future approaches.
  141. (1998). The BAF: A corpus of English-French bitext.
  142. (2003). The CMU statistical machine translation system.
  143. (2001). The Commission's MT System: Today and Tomorrow.
  144. (1998). The DARPA Machine Translation Evaluation Methodology: Past and Present.
  145. (1977). The evaluation and systems analysis of the Systran machine translation system.
  146. (1994). The Language Instinct. doi
  147. (2003). The limits of n-gram translation evaluation metrics.
  148. (2005). The Linguist's Search Engine: Getting Started Guide. doi
  149. (2000). The machine translation of prepositional phrases. Unpublished PhD dissertation.
  150. (2001). The Naming of Things and the Confusion of Tongues.
  151. (2004). The significance of recall in automatic metrics for MT evaluation. doi
  152. (2003). The web as a parallel corpus. doi
  153. (2002). Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real 249 world.
  154. (2000). Toward an automated, task-based MT evaluation strategy.
  155. (2000). Toward the development of a postediting module for raw machine translation output: a controlled language perspective.
  156. (2003). Training a super model look-alike: Featuring edit distance, n-gram cooccurrence, and one reference translation.
  157. (1998). Translation Differences and Pragmatics-Based MT.
  158. (2001). Translation Quality Metric.
  159. (2006). Translation Resources.
  160. (1983). Translation Specifications. In
  161. (1998). Translation: From Real Users to Research: doi
  162. (2002). Translators Association) Framework for Standard Error Marking.
  163. (2001). Trial and error: an evaluation project on Japancsc-English MT output quality.
  164. (1996). TSNLP - Test Suites for Natural Language Processing. doi
  165. (1997). Tutorial slides on MT Evaluation. MT Summit 6.
  166. (2005). Using corpora to automatically detect untranslated and "outrageous" words in machine translation output.
  167. (2001). Using multiple edit distances to automatically rank machine translation output. doi
  168. (1993). Using register-diversified corpora for general language studies.
  169. (1997). WagSoft Linguistic Software: Systemic Coder -a Text Markup Tool. Available from: http: //www.
  170. (2004). Weighted n-gram model for evaluating machine translation output.
  171. (2005). Wmatrix: a web-based corpus processing environment, Computing Department, Lancaster University. Available from: http: //www.
  172. (2001). Worldlingo free online machine translation. Available at: http: //www. worldlin2o. com/

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.