Identifying Computer-Translated Paragraphs using Coherence Features

Echizen, Isao; H. Nguyen, Huy; Nguyen-Son, Hoang-Quoc; T. Tieu, Ngoc-Dung; Yamagishi, Junichi

Identifying Computer-Translated Paragraphs using Coherence Features

Authors: Isao Echizen
Huy H. Nguyen
Hoang-Quoc Nguyen-Son
Ngoc-Dung T. Tieu
Junichi Yamagishi
Publication date: 3 December 2018
Publisher

Abstract

We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences. We conducted an experiment with a parallel German corpus containing 2000 human-created and 2000 machine-translated paragraphs. The result showed that our method achieved the best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is compared with previous methods on various computer-generated text including translation and paper generation (best accuracy = 67.9%, equal error rate = 32.0%). Experiments on Dutch, another rich resource language, and a low resource one (Japanese) attained similar performances. It demonstrated the efficiency of the coherence features at distinguishing computer-translated from human-created paragraphs on diverse languages.Comment: 9 pages, PACLIC 201

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 11/05/2020