Search CORE

17 research outputs found

Introduction:Post-Editing in Practice – Process, Product and Networks

Author: Alonso Elisa
Bywood Lindsay
Nunes Vieira Lucas
Publication venue
Publication date: 31/01/2019
Field of study

Do translators use machine translation and if so, how? Results of a survey held among professional translators

Author: farrell
Publication venue: place:Ginevra
Publication date: 01/09/2023
Field of study

The author conducted an anonymous online survey between 23 July and 21 October 2022 to gain insight into the proportion of translators that use machine translation (MT) in their translation workflow and the various ways they do. The results show that translators with more experience are less likely to accept MT post-editing (MTPE) assignments than their less experienced colleagues but are equally likely to use MT themselves in their translation work. Translators who deal with lower-resource languages are also less likely to accept MTPE jobs, but there is no such relationship regarding the use of MT in their own workflow. When left to their own devices, only 18.57% of the 69.54% of respondents that declared that they use MT while translating always or usually use it in the way the pioneers of MT envisaged, i.e., MTPE. Most either usually or always prefer to use MT in a whole range of other ways, including enabling MT functions in CAT tools and doing hybrid post-editing; using MT engines as if they were dictionaries; and using MT for inspiration. The vast majority of MT-users see MT as just another tool that their clients do not necessarily need to be informed about

Apeiron - IULM

Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

Author: Carpuat Marine
Duh Kevin
Post Matt
Rippeth Elijah
Publication venue
Publication date: 26/11/2023
Field of study

Lexical ambiguity is a challenging and pervasive problem in machine translation (\mt). We introduce a simple and scalable approach to resolve translation ambiguity by incorporating a small amount of extra-sentential context in neural \mt. Our approach requires no sense annotation and no change to standard model architectures. Since actual document context is not available for the vast majority of \mt training data, we collect related sentences for each input to construct pseudo-documents. Salient words from pseudo-documents are then encoded as a prefix to each source sentence to condition the generation of the translation. To evaluate, we release \docmucow, a challenge set for translation disambiguation based on the English-German \mucow \cite{raganato-etal-2020-evaluation} augmented with document IDs. Extensive experiments show that our method translates ambiguous source words better than strong sentence-level baselines and comparable document-level baselines while reducing training costs

arXiv.org e-Print Archive

Post-Editing of Machine Translation

Author: Nunes Vieira Lucas
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

Crossref

Explore Bristol Research

Comparison between statistical and neuronal models for machine translation

Author: Llorens Ripollés José Manuel
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 18/09/2018
Field of study

[EN] Machine translation is a thriving field that deals with multiple of the challenges that the modern world face. From accessing to knowledge in a foreign language, to being able to communicate with people that does not speakthelanguage, we can take great benefit from automatic translation made by software. The state-of-the-art models of machine translation during the last decades, based of inferred statistical knowledge over a set of parallel data, had been recently challenged by neural models based on large artificial neural networks. This study aims to compare both methods of machine translation, the one based on statistical inference (SMT) and the one based on neural networks (NMT). The objective of the project is to compare the performance and the computational needs of both models depending on different factors like the size of the training data or the likeliness of language pair. To make this comparison I have used publicly available parallel data and frameworks in order to implement the models. The evaluation of said models are done under the BLEU score, which computes the correspondence of the translation with the translation made by a human operation. The results indicate that the SMT model outperform the NMT model given relatively small amount of data and a basic set of techniques. The results also shown that NMT have a substantially higher need of processing power, given that the training of large ANN is more demanding than the statistical inference[ES] La traducción automática (TA) es el uso de software para traducir desde un idioma a otro. El objetivo de realizar traducciones automaticas entre idiomas se remonta a los inicios de los computadores electrónicos. La TA ha evolucionado desde sus inicios en los años 50 reflejando los avances en el campo de la computación. En los años 80 un equipo dirigido por Makoro Nagao desarrolló el primer sistema que basaba la traducción en la analogía entre textos traducidos. Este fue el primer sistema de traducción automática estadística (TAE). La idea básica detrás de la TAE es usar las distribuciones de probabilidades extraídas de los textos traducidos para crear un modelo de traducción. Los sistemas de TAE han sido los sistemas de TA más estudiados y el estándard de estas ultimas décadas. No obstante, con la rápida expansión de los sistemas neuronales en la computación, hemos visto un rápido incremento de la traducción automática neuronal (TAN) con grandes empresas como Google cambiando sus sistemas de traducción de los previos modelos estadísticos a modelos neuronales. El objetivo de este proyecto es comparar la TAE y la TAN. La TAE usa la probabilidad como base de su traducción mientras que la TAN usa grandes redes neuronales. Con esta comparación espero ganar un profundo conocimiento sobre como los diferentes algoritmos y parámetros de los dos métodos afectan a sus traducciones. Con los nuevos avances computacionales y en un mundo más global que nunca, la TA es un campo prospero. Comparar estos dos métodos nos puede dar la información necesaria para decidir cual usar dependiendo de nuestra situación y limitaciones. Además, entender la evolución de un campo como el de la TA nos puede ayudar a visualizar futuros cambios e identificar áreas de investigación interesantes. En este proyecto compararé la TAE con la TAN. El alcance de esta comparación incluye (pero no está limitado a) los fundamentos de los modelos, su efectividad, la cantidad de recursos computacionales que necesitan y la cantidad de datos de entrenamiento que necesitan. Consecuentemente, el problema puede ser definido como: Cuales son las principales diferencias entre la TAE y la TAN y como se desempeñan estos métdos con diferentes idiomas y diferentes cantidades de recursos como el tamaño de los datos de entrenamiento . Para la comparación usare distintos marcos de trabajo como MOSES para estudiar las traducciones de métodos de TAE o OpenNMT para la TAN. Respecto a los datos de entrenamiento, me centraré en los conjuntos de datos proporcionados para el workshop en TAE (WMT), concretamente aquellos textos con traducciones de noticias. Una de las principales comparaciones será ir incrementando el tamaño de los datos de entrenamiento para ver como influye en la calidad de la traducción y en la necesidad de recursos computacionales. La evaluación de la traducción es una tarea compleja y un campo de investigación por si mismo dentro de la TM. Para este proyecto usare el método BLEU. Otra comparación importante es comparar como los modelos se desempeñan con pares de idiomas más sencillos como Inglés y Alemán en comparación a como lo hacen con pares más complejos como Chino y Inglés.Llorens Ripollés, JM. (2018). Comparison between statistical and neuronal models for machine translation. http://hdl.handle.net/10251/107663TFG

RiuNet

Internship report. Translating and interpreting for the cultural and creative industries

Author: Sousa Rui Pedro Costa
Publication venue
Publication date: 09/12/2022
Field of study

Many will agree that the internship experience provides a singular opportunity to put the knowledge, skills, and competences acquired during the process of formal university study into practice. An intern, usually a graduate student with a strong theoretical background in the area of specialization, can benefit from some job experience, which will enable him or her to get a deeper understanding of actual working situations. Additionally, the host institution that is providing the internship may benefit from it as well, taking advantage of the chance to put the intern's abilities to use. This internship was done to conclude my MA in Linguistics: Societies and Cultures in which I worked on the Teatro Municipal Baltazar Dias in Funchal, Madeira. The internship lasted 640 hours, over the course of about 7 months, and saw in it many tasks of smaller scale, such as translating flyers and other smaller documents, to larger assignments such as the translation of a cultural agenda. This internship report then comprises that experience, starting with an introduction to the interning institution, a contextualisation of the work’s important themes, an extensive overview of the activities done, an analysis and discussion of said activities and the concluding thoughts of the experience.Muitos concordarão que a experiência de estágio oferece uma oportunidade única para colocar em prática os conhecimentos, habilidades e competências adquiridos durante o processo de estudo universitário formal. Um estagiário, geralmente um estudante de pós-graduação com uma sólida formação teórica na área de especialização, pode beneficiar de alguma experiência de trabalho, o que lhe permitirá obter uma compreensão mais profunda das situações profissionais atuais. Além disso, a instituição de acolhimento que oferece o estágio também pode beneficiar da formação do estagiário, aproveitando a oportunidade para colocar em prática as suas competências e conhecimentos especializados. O estágio foi feito para concluir o curso de Mestrado em Linguística: Sociedades e Culturas, tendo estagiado no Teatro Municipal Baltazar Dias, no Funchal, ilha da Madeira. Este estágio durou 640 horas, num espaço de por volta de 7 meses, tendo múltiplas tarefas de menor escala, como traduzir panfletos e outros documentos pequenos até tarefas maiores como a tradução de uma agenda cultural. O relatório expõe experiência de estágio, começando com uma introdução sobre a instituição de acolhimento, uma contextualização das temáticas trabalhadas, uma visão global minuciosa das atividades realizadas, uma análise e apreciação crítica das mesmas e das competências desenvolvidas

Repositório Digital da Universidade da Madeira

Heart Failure Factors: a database approach

Author: Gállego Olsina Gerard Ion
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/03/2019
Field of study

This project proposes to use data to find correlations with the amount of heart failures. The student will have to collect data. This information will be used to extract correlations with a contrastive database that contains details on heart attacks using the temporal information as main anchor.This project aims to find relationships between psychological stress factors and heart attacks that took place in Catalunya between 2010 and 2016. We have measured these factors through the news that were published in La Vanguardia?s Twitter account, processed by Machine Learning techniques, such as Word Embeddings or Clustering.El objetivo de este proyecto es encontrar las relaciones entre factores de estrés psicosocial y ataques al corazón que se produjeron en Cataluña entre 2010 y 2016. Estos factores se miden a través de las noticias publicadas en la cuenta de Twitter de La Vanguardia, procesadas con técnicas de Machine Learning, tales como Word Embeddings y Clustering.L'objectiu d'aquest projecte és trobar les relacions entre factors d'estrès psicosocial i els atacs de cor que es van produir a Catalunya entre 2010 i 2016. Aquests factors es mesuren a través de les notícies publicades al compte de Twitter de La Vanguardia, processades amb tècnques de Machine Learning, com Word Embeddings i Clustering

UPCommons. Portal del coneixement obert de la UPC

Translation \u3cem\u3eal Mercato del Pesce\u3c/em\u3e: The Importance of Human Input for Machine Translation

Author: Schechter Emma Y., , \u2723
Publication venue: 'Transformative Works and Cultures'
Publication date: 01/10/2022
Field of study

This thesis investigates translation of Italian idioms and metaphors into English, and the difficulties encountered by Machine Translation in this process. I use a framework of foreign concepts to explain many of the difficulties, as well as interviews with native Italian and English speakers to provide further context for the cultural knowledge encoded in figurative language. I conclude that in Machine Translation a consistent human input interface as well as a continuous training in language corpora is crucial to improve the accuracy of translated metaphors and idioms, using Italian to English translation as a case study

Works

Simplifying Encoder-Decoder-Based Neural Machine Translation Systems to Translate between Related Languages

Author: Gil Melby Lucas
Publication venue
Publication date: 17/09/2018
Field of study

Neural machine translation is one of the most advanced approaches to machine translation and one that is recently obtaining good enough results to make use of it in real-life scenarios. The currently widely used architecture is what is known as sequence-to-sequence architecture with attention mechanism, which uses an encoder to create a vector representation of the input sentence in source language, a decoder to output a sentence in target language and an attention mechanism to help the decoder produce more accurate outputs. The simplification of state-of-the-art sequence-to-sequence neural machine translation with attention is explored in this work for the translation between related languages. First, some of the state-of-the-art features present in the baseline system are presented and described. The main hypothesis of this work is the possibility of removing these features without worsening the translation quality too much and simplifying the network's structure at the same time when translating between related languages. The main part of this work is the substitution of state-of-the-art attention mechanisms, used to help the decoder know which part of the source sentence is more relevant for the part of the target sentence being outputted, by a simplified attention mechanism which mostly pays attention to the word in the source sentence in the same position as the current target word. The simplification is carried out by removing beam search (a technique used to explore a wider range of possible outputs instead of limiting the output to the highest probability of being the correct output), substituting the bidirectional encoder by a unidirectional encoder and creating a new \local attention" mechanism in replacement for the current more complex state-of-the-art attention mechanism. Once the simplifications have been discussed and implemented, their impact on translation quality for related languages (Spanish and Catalan in the case of this work) is tested and compared to determine their suitability. From the results obtained, as expected, the removal of beam search and the substitution of the bidirectional encoder by a unidirectional encoder does not have a great impact on translation quality, resulting in a decrease of 6%-23% in BLEU score depending on the attention mechanism being used. On top of this, the introduction of the newly-developed \local attention" mechanism improves translation quality by 176% and 218% in BLEU score when compared to an attention-less system, about 22%-27% less than the state-of-the-art attention mechanism used in the baseline system. All of this resulting in the great simplification in the network, reducing the number of trainable parameters from 12.195.945 to 9.816.485 (19.5%) and the training time from 22h 53m to 12h 15m

Repositorio Institucional de la Universidad de Alicante

應用變形器模型於文言文改寫白話文之研究

Author: 魏世杰
Publication venue
Publication date
Field of study

[[abstract]]隨著歷史的變遷與發展，中文書面語標準與風格經歷數次大幅變化，造成現代人的文言文理解能力較弱。為了減少現代人文言文與白話文之間的理解偏差，協助理解文言文的寫作風格，本文將建立文言文生成白話文的文本改寫模型。作法為基於文言文與白話文的平行語料，訓練近年流行的變形器模型，提取兩者關聯性，進行文言文轉換成白話文的改寫實驗。最後，以雙語評估研究BLEU 指標對生成的白話文句子做評估，並對深度學習用於協助文言文理解提供個案考察報告。[[notice]]補正完

Tamkang University Institutional Repository