Search CORE

5,877 research outputs found

A fine-grained error analysis of NMT, PBMT and RBMT output for English-to-Dutch

Author: Macken Lieve
Tezcan Arda
Van Brussel Laura
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Identifying the machine translation error types with the greatest impact on post-editing effort

Author: Daems Joke
Hartsuiker Robert
Macken Lieve
Vandepitte Sonia
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Translation Environment Tools make translators' work easier by providing them with term lists, translation memories and machine translation output. Ideally, such tools automatically predict whether it is more effortful to post-edit than to translate from scratch, and determine whether or not to provide translators with machine translation output. Current machine translation quality estimation systems heavily rely on automatic metrics, even though they do not accurately capture actual post-editing effort. In addition, these systems do not take translator experience into account, even though novices' translation processes are different from those of professional translators. In this paper, we report on the impact of machine translation errors on various types of post-editing effort indicators, for professional translators as well as student translators. We compare the impact of MT quality on a product effort indicator (HTER) with that on various process effort indicators. The translation and post-editing process of student translators and professional translators was logged with a combination of keystroke logging and eye-tracking, and the MT output was analyzed with a fine-grained translation quality assessment approach. We find that most post-editing effort indicators (product as well as process) are influenced by machine translation quality, but that different error types affect different post-editing effort indicators, confirming that a more fine-grained MT quality analysis is needed to correctly estimate actual post-editing effort. Coherence, meaning shifts, and structural issues are shown to be good indicators of post-editing effort. The additional impact of experience on these interactions between MT quality and post-editing effort is smaller than expected

Ghent University Academic Bibliography

Directory of Open Access Journals

Frontiers - Publisher Connector

Archivsystem Ask23

Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Author: A Roberts
A Shah
Aleksandar Savkov
B Efron
G Hripcsak
G Savova
J Cohen
J Foster
J-W Fan
Jackie Cassell
John Carroll
K Verspoor
KH Krippendorff
LK Tanabe
M Bada
MP Marcus
Rob Koeling
S Abney
W Sun
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

Crossref

Springer - Publisher Connector

PubMed Central

Sussex Research Online

Learning from Auxiliary Sources in Argumentative Revision Classification

Author: Afrin Tazin
Litman Diane
Publication venue
Publication date: 13/09/2023
Field of study

We develop models to classify desirable reasoning revisions in argumentative writing. We explore two approaches -- multi-task learning and transfer learning -- to take advantage of auxiliary sources of revision data for similar tasks. Results of intrinsic and extrinsic evaluations show that both approaches can indeed improve classifier performance over baselines. While multi-task learning shows that training on different sources of data at the same time may improve performance, transfer-learning better represents the relationship between the data

arXiv.org e-Print Archive