Search CORE

4 research outputs found

Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method

Author: Fancellu Federico
Sennrich Rico
Shao Yutong
Webber Bonnie L.
Publication venue
Publication date: 20/02/2018
Field of study

Idiom translation is a challenging problem in machine translation because the meaning of idioms is non-compositional, and a literal (word-by-word) translation is likely to be wrong. In this paper, we focus on evaluating the quality of idiom translation of MT systems. We introduce a new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error. We introduce a dataset, CIBB (Chinese Idioms Blacklists Bank), and perform an evaluation of a state-of-the-art Chinese-English neural MT system. Our evaluation confirms that a sizable number of idioms in our test set are mistranslated (46.1%), that literal translation error is a common error type, and that our blacklist method is effective at identifying literal translation errors.Comment: Full paper accepted by LREC, 8 page

arXiv.org e-Print Archive

Edinburgh Research Explorer

Kualitas Terjemahan Idiom Dan Technical Terminolgy Dengan Google Translate (Studi Kasus: Terjemahan Mahasiswa)

Author: Afriani Afriani
Publication venue: LPPM Universitas Terbuka
Publication date: 27/06/2023
Field of study

Tujuan dari penelitian ini adalah untuk menganalisis kualitas terjemahan idiom dan terminologi khusus yang dilakukan oleh mahasiswa dengan menggunakan teknologi penerjemahan berupa ‘Google Translate’ (GT). Data diperoleh dengan menggunakan dua metode. Pertama, satu set instrumen yang berisi teks sumber (TSu)dalam bahasa Inggris. Siswa diminta untuk menerjemahkan ke dalam bahasa Indonesia sebagai bahasa sasaran (BSa). Kedua, wawancara untuk mendapatkan informasi tentang pengalaman dan pendapat mahasiswa tentang penggunaan GT. Terjemahan yang dihasilkan untuk idiom dan terminologi khusus tidak berkualitas karena dua sebab. Pertama, GT tidak dapat menerjemahkan unsur budaya karena masih dibutuhkan penerjemah manusia dalam proses penerjemahan. Kedua, mahasiswa kurang memiliki pengetahuan tentang bagaimana menggunakan GT dengan baik, sehingga pendidikan terkait GT menjadi penting bagi para penerjemah. Rekomendasi dari penelitian ini adalah mahasiswa atau penerjemah terlebih dahulu harus mengidentifikasi idiom dan terminologi khusus TSu sebelum menerjemahkan menggunakan GT. Kedua, mahasiswa atau penerjemah harus mengetahui konteks TSu sebelum menerjemahkan menggunakan GT. Terakhir, teks sasaran yang dihasilkan dengan GT harus dibaca ulang sehingga terasa wajar dan berterima di dalam budaya pembaca BSa

Jurnal Online Universitas Terbuka

A BERT-based two-stage model for Chinese Chengyu recommendation

Author: DAI Bingtian
Jing JIANG
TAN Minghuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2021
Field of study

Institutional Knowledge at Singapore Management University

Understanding and Enhancing the Use of Context for Machine Translation

Author: Fadaee Marzieh
Publication venue
Publication date: 01/01/2020
Field of study

To understand and infer meaning in language, neural models have to learn complicated nuances. Discovering distinctive linguistic phenomena from data is not an easy task. For instance, lexical ambiguity is a fundamental feature of language which is challenging to learn. Even more prominently, inferring the meaning of rare and unseen lexical units is difficult with neural networks. Meaning is often determined from context. With context, languages allow meaning to be conveyed even when the specific words used are not known by the reader. To model this learning process, a system has to learn from a few instances in context and be able to generalize well to unseen cases. The learning process is hindered when training data is scarce for a task. Even with sufficient data, learning patterns for the long tail of the lexical distribution is challenging. In this thesis, we focus on understanding certain potentials of contexts in neural models and design augmentation models to benefit from them. We focus on machine translation as an important instance of the more general language understanding problem. To translate from a source language to a target language, a neural model has to understand the meaning of constituents in the provided context and generate constituents with the same meanings in the target language. This task accentuates the value of capturing nuances of language and the necessity of generalization from few observations. The main problem we study in this thesis is what neural machine translation models learn from data and how we can devise more focused contexts to enhance this learning. Looking more in-depth into the role of context and the impact of data on learning models is essential to advance the NLP field. Moreover, it helps highlight the vulnerabilities of current neural networks and provides insights into designing more robust models.Comment: PhD dissertation defended on November 10th, 202

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE