5,206 research outputs found
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Detecting Machine-Translated Text using Back Translation
Machine-translated text plays a crucial role in the communication of people
using different languages. However, adversaries can use such text for malicious
purposes such as plagiarism and fake review. The existing methods detected a
machine-translated text only using the text's intrinsic content, but they are
unsuitable for classifying the machine-translated and human-written texts with
the same meanings. We have proposed a method to extract features used to
distinguish machine/human text based on the similarity between the intrinsic
text and its back-translation. The evaluation of detecting translated sentences
with French shows that our method achieves 75.0% of both accuracy and F-score.
It outperforms the existing methods whose the best accuracy is 62.8% and the
F-score is 62.7%. The proposed method even detects more efficiently the
back-translated text with 83.4% of accuracy, which is higher than 66.7% of the
best previous accuracy. We also achieve similar results not only with F-score
but also with similar experiments related to Japanese. Moreover, we prove that
our detector can recognize both machine-translated and machine-back-translated
texts without the language information which is used to generate these machine
texts. It demonstrates the persistence of our method in various applications in
both low- and rich-resource languages.Comment: INLG 2019, 9 page
Detecting grammatical errors in machine translation output using dependency parsing and treebank querying
Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical errors in Dutch machine-translated text, using dependency parsing and treebank querying. We test our approach on the output of a statistical and a rule-based MT system for English-Dutch and evaluate the performance on sentence and word-level. The results show that our method can be used to detect grammatical errors with high accuracy on sentence-level in both types of MT output
Identifying Computer-Translated Paragraphs using Coherence Features
We have developed a method for extracting the coherence features from a
paragraph by matching similar words in its sentences. We conducted an
experiment with a parallel German corpus containing 2000 human-created and 2000
machine-translated paragraphs. The result showed that our method achieved the
best performance (accuracy = 72.3%, equal error rate = 29.8%) when it is
compared with previous methods on various computer-generated text including
translation and paper generation (best accuracy = 67.9%, equal error rate =
32.0%). Experiments on Dutch, another rich resource language, and a low
resource one (Japanese) attained similar performances. It demonstrated the
efficiency of the coherence features at distinguishing computer-translated from
human-created paragraphs on diverse languages.Comment: 9 pages, PACLIC 201
Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives
Disfluencies (i.e. interruptions in the regular flow of speech), are
ubiquitous to spoken discourse. Fillers ("uh", "um") are disfluencies that
occur the most frequently compared to other kinds of disfluencies. Yet, to the
best of our knowledge, there isn't a resource that brings together the research
perspectives influencing Spoken Language Understanding (SLU) on these speech
events. This aim of this article is to synthesise a breadth of perspectives in
a holistic way; i.e. from considering underlying (psycho)linguistic theory, to
their annotation and consideration in Automatic Speech Recognition (ASR) and
SLU systems, to lastly, their study from a generation standpoint. This article
aims to present the perspectives in an approachable way to the SLU and
Conversational AI community, and discuss moving forward, what we believe are
the trends and challenges in each area.Comment: To appear in TAL Journa
Alzheimer's Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models
In this paper, we combined linguistic complexity and (dis)fluency features
with pretrained language models for the task of Alzheimer's disease detection
of the 2021 ADReSSo (Alzheimer's Dementia Recognition through Spontaneous
Speech) challenge. An accuracy of 83.1% was achieved on the test set, which
amounts to an improvement of 4.23% over the baseline model. Our best-performing
model that integrated component models using a stacking ensemble technique
performed equally well on cross-validation and test data, indicating that it is
robust against overfitting.Comment: accepted at Interspeech202
- …