4,306 research outputs found

    A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output

    Get PDF
    This paper presents a comparison of post-editing (PE) changes performed on English-to-Finnish neural (NMT), rule-based (RBMT) and statistical machine translation (SMT) output, combining a product-based and a process-based approach. A total of 33 translation students acted as participants in a PE experiment providing both post-edited texts and edit process data. Our product-based analysis of the post-edited texts shows statistically significant differences in the distribution of edit types between machine translation systems. Deletions were the most common edit type for the RBMT, insertions for the SMT, and word form changes as well as word substitutions for the NMT system. The results also show significant differences in the correctness and necessity of the edits, particularly in the form of a large number of unnecessary edits in the RBMT output. Problems related to certain verb forms and ambiguity were observed for NMT and SMT, while RBMT was more likely to handle them correctly. Process-based comparison of effort indicators shows a slight increase of keystrokes per word for NMT output, and a slight decrease in average pause length for NMT compared to RBMT and SMT in specific text blocks. A statistically significant difference was observed in the number of visits per sub-segment, which is lower for NMT than for RBMT and SMT. The results suggest that although different types of edits were needed to outputs from NMT, RBMT and SMT systems, the difference is not necessarily reflected in process-based effort indicators.</p

    What do post-editors correct? : A fine-grained analysis of SMT and NMT errors

    Get PDF
    The recent improvements in neural MT (NMT) have driven a shift from statistical MT (SMT) to NMT. However, to assess the usefulness of MT models for post-editing (PE) and have a detailed insight of the output they produce, we need to analyse the most frequent errors and how they affect the task. We present a pilot study of a fine-grained analysis of MT errors based on post-editors corrections for an English to Spanish medical text translated with SMT and NMT. We use the MQM taxonomy to compare the two MT models and have a categorized classification of the errors produced. Even though results show a great variation among post-editors' corrections, for this language combination fewer errors are corrected by post-editors in the NMT output. NMT also produces fewer accuracy errors and errors that are less critical.Les millores recents en la TA neuronal (TAN) han impulsat un canvi de la TA estadística (TAE) a la TAN. Tanmateix, per avaluar la utilitat dels models de TA per a la postedició (PE), és fonamental analitzar els errors més freqüents i com afecten la tasca. Presentem un estudi pilot d'una anàlisi detallada dels errors de la TA basat en correccions de postedició d'un text mèdic traduït de l'anglès al castellà amb TAE i TAN. Hem utilitzat la taxonomia MQM per comparar els dos models de TA i hem classificat els errors produïts. La nostra anàlisi també inclou una avaluació de la variació entre els posteditors, que se centra en els passatges amb una major variació en la postedició.Los avances recientes en TA neuronal (TAN) han producido un giro desde la TA estadística (TAE) hacia la TAN. Sin embargo, para evaluar la utilidad de los modelos de TA para la posedición, es imprescindible analizar los errores más frecuentes y cómo afectan a esta tarea. Presentamos el estudio piloto de un análisis pormenorizado de errores en TA basado en las correcciones realizadas por los poseditores en la traducción de un texto médico realizada del inglés al castellano mediante TAE y TAN. Utilizamos la taxonomía MQM para comparar los dos modelos de TA y obtener una clasificación categorizada de los errores resultantes. Nuestro análisis incluye también una evaluación de las diferencias entre poseditores, centrada en los pasajes en los que la posedición presentaba mayor disparidad

    A short guide to post-editing (Volume 16)

    Get PDF
    Artificial intelligence is changing and will continue to change the world we live in. These changes are also influencing the translation market. Machine translation (MT) systems automatically transfer one language to another within seconds. However, MT systems are very often still not capable of producing perfect translations. To achieve high quality translations, the MT output first has to be corrected by a professional translator. This procedure is called post-editing (PE). PE has become an established task on the professional translation market. The aim of this text book is to provide basic knowledge about the most relevant topics in professional PE. The text book comprises ten chapters on both theoretical and practical aspects including topics like MT approaches and development, guidelines, integration into CAT tools, risks in PE, data security, practical decisions in the PE process, competences for PE, and new job profiles

    Pre-editing and post-editing

    Get PDF
    This chapter provides an accessible introductory view of pre-editing and post-editing as the starting-point for research or work in the language industry. It describes source text pre-editing and machine translation post-editing from an industrial as well as academic point of view. In the last ten to fifteen years, there has been a considerable growth in the number of studies and publications dealing with pre-editing, and especially post-editing, that have helped researchers and the industry to understand the impact machine translation technology has on translators’ output and their working environment. This interest is likely to continue in view of the recent developments in neural machine translation and artificial intelligence. Although the latest technology has taken a considerable leap forward, the existing body of work should not be disregarded as it has defined clear research lines and methods, as it is more necessary than ever to look at data in their appropriate context and avoid generalizing in the vast and diverse territory of human and machine translation

    Pre-editing and post-editing

    Get PDF
    This chapter provides an accessible introductory view of pre-editing and post-editing as the starting-point for research or work in the language industry. It describes source text pre-editing and machine translation post-editing from an industrial as well as academic point of view. In the last ten to fifteen years, there has been a considerable growth in the number of studies and publications dealing with pre-editing, and especially post-editing, that have helped researchers and the industry to understand the impact machine translation technology has on translators’ output and their working environment. This interest is likely to continue in view of the recent developments in neural machine translation and artificial intelligence. Although the latest technology has taken a considerable leap forward, the existing body of work should not be disregarded as it has defined clear research lines and methods, as it is more necessary than ever to look at data in their appropriate context and avoid generalizing in the vast and diverse territory of human and machine translation

    Corpus-Based Machine Translation : A Study Case for the e-Government of Costa Rica Corpus-Based Machine Translation: A Study Case for the e-Government of Costa Rica

    Get PDF
    Esta investigación pretende estudiar el estado del arte en las tecnologías de la traducción automática. Se explorará la teoría fundamental de los sistemas estadísticos basados en frases (PB-SMT) y neuronales (NMT): su arquitectura y funcionamiento. Luego, nos concentraremos en un caso de estudio que pondrá a prueba la capacidad del traductor para aprovechar al máximo el potencial de estas tecnologías. Este caso de estudio incita al traductor a poner en práctica todos sus conocimientos y habilidades profesionales para llevar a cabo la preparación de datos, entrenamiento, evaluación y ajuste de los motores.This research paper aims to approach the state-of-the-art technologies in machine translation. Following an overview of the architecture and mechanisms underpinning PB-SMT and NMT systems, we will focus on a specific use-case that would attest the translator's agency at maximizing the cutting-edge potential of these technologies, particularly the PB-SMT's capacity. The use-case urges the translator to dig out of his/her toolbox the best practices possible to improve the translation output text by means of data preparation, training, assessment and refinement tasks

    Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications

    Get PDF
    If a machine translation is wrong, how we can tell the underlying model to fix it? Answering this question requires (1) a machine learning algorithm to define update rules, (2) an interface for feedback to be submitted, and (3) expertise on the side of the human who gives the feedback. This thesis investigates solutions for machine learning updates, the suitability of feedback interfaces, and the dependency on reliability and expertise for different types of feedback. We start with an interactive online learning scenario where a machine translation (MT) system receives bandit feedback (i.e. only once per source) instead of references for learning. Policy gradient algorithms for statistical and neural MT are developed to learn from absolute and pairwise judgments. Our experiments on domain adaptation with simulated online feedback show that the models can largely improve under weak feedback, with variance reduction techniques being very effective. In production environments offline learning is often preferred over online learning. We evaluate algorithms for counterfactual learning from human feedback in a study on eBay product title translations. Feedback is either collected via explicit star ratings from users, or implicitly from the user interaction with cross-lingual product search. Leveraging implicit feedback turns out to be more successful due to lower levels of noise. We compare the reliability and learnability of absolute Likert-scale ratings with pairwise preferences in a smaller user study, and find that absolute ratings are overall more effective for improvements in down-stream tasks. Furthermore, we discover that error markings provide a cheap and practical alternative to error corrections. In a generalized interactive learning framework we propose a self-regulation approach, where the learner, guided by a regulator module, decides which type of feedback to choose for each input. The regulator is reinforced to find a good trade-off between supervision effect and cost. In our experiments, it discovers strategies that are more efficient than active learning and standard fully supervised learning

    Post-editing for Professional Translators : Cheer or Fear?

    Get PDF
    Currently, post-editing of machine translation (MT) has been introduced as a regular practice in the translation workflow, especially since the good results in quality obtained by neural MT (NMT). This fact is linked to the efforts LSPs and customers have done to reduce costs due to the recent global crisis and the increasing globalization, which has had a negative impact on translators' revenues and on their working practices. In this context, post-editing is often perceived with a negative bias by translators. We study attitudes of translators post-editing for the first time and relate them to their productivity rates. We also compare the results with a survey answered by professional post-editors assessing their perception of the task in the current marketplace.Actualmente, la posedición de traducción automática (TA) se considera una práctica habitual en el flujo de trabajo de traducción, sobre todo por la buena calidad que se obtiene con la traducción automática neuronal (TAN). Este hecho está asociado a los esfuerzos que han hecho los proveedores de servicios lingüísticos y los clientes para reducir los costos debido a la reciente crisis mundial y a la creciente globalización, que ha tenido un impacto negativo en los ingresos de los traductores y en sus prácticas profesionales. En este contexto, los traductores suelen percibir la posedición con un sesgo negativo. En este artículo se presenta uno de los primeros estudios estudio sobre las actitudes de los traductores ante la posedición y se relacionan con sus tasas de productividad. También cotejamos los resultados con una encuesta contestada por poseditores profesionales que evalúan su percepción de la tarea en el mercado actual.Actualment, la postedició de traducció automàtica (TA) és considerada una pràctica habitual en el flux de treball de la traducció, sobretot per la bona qualitat que s'obté amb la traducció automàtica neuronal (TAN). Aquest fet està assocat als esforços que han fet els proveïdors de serveis lingüístics i els clients per reduir els costos a causa de la crisi mundial dels darrers temps i la creixent globalització, que ha tingut un impacte negatiu sobre els ingressos dels traductors i sobre les seves pràctiques professionals. En aquest cotext, els traductors acostumen a percebre la postedició amb un biaix negatiu. En aquest article es presenta un dels primers estudis sobre les actituds dels traductors envers la postedició i es relacionen amb les seves taxes de productivitat. També acarem els resultats amb una enquesta contestada per posteditors professionals que avaluen la seva percepció de la tasca en el mercat actual
    corecore