Search CORE

18 research outputs found

Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

Author: Erofeev Gleb
Gladkoff Serge
Han Lifeng
Nenadic Goran
Publication venue
Publication date: 21/08/2023
Field of study

Translation Quality Estimation (TQE) is an essential step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human translation (HT) quality without seeing the reference translations. This work examines whether the state-of-the-art large language models (LLMs) can be fine-tuned for the TQE task and their capability. We take ChatGPT as one example and approach TQE as a binary classification task. Using \textbf{eight language pairs} including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese training corpora, our experimental results show that fine-tuned ChatGPT via its API can achieve a relatively high score on predicting translation quality, i.e. \textit{if the translation needs to be edited}. However, there is definitely much space to improve the model accuracy, e.g. they are 82.42\% and 83.69\% for English-Italian and English-German respectively using our experimental settings. English-Italiano bilingual Abstract is available in the paper.Comment: 8 pages, 11 figures, under-review to ItalianNLP-202

arXiv.org e-Print Archive

Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Author: Erofeev Gleb
Gladkoff Serge
Han Lifeng
Nenadic Goran
Sorokina Irina
Publication venue
Publication date: 04/06/2023
Field of study

Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks. This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning. We carry out an experimental investigation using Meta-AI's MMPLMs ``wmt21-dense-24-wide-en-X and X-en (WMT21fb)'' which were pre-trained on 7 language pairs and 14 translation directions including English to Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese, and the opposite direction. We fine-tune these MMPLMs towards English-\textit{Spanish} language pair which \textit{did not exist at all} in their original pre-trained corpora both implicitly and explicitly. We prepare carefully aligned \textit{clinical} domain data for this fine-tuning, which is different from their original mixed domain knowledge. Our experimental result shows that the fine-tuning is very successful using just 250k well-aligned in-domain EN-ES segments for three sub-task translation testings: clinical cases, clinical terms, and ontology concepts. It achieves very close evaluation scores to another MMPLM NLLB from Meta-AI, which included Spanish as a high-resource setting in the pre-training. To the best of our knowledge, this is the first work on using MMPLMs towards \textit{clinical domain transfer-learning NMT} successfully for totally unseen languages during pre-training.Comment: Accepted to ClinicalNLP-2023 WS@ACL-202

arXiv.org e-Print Archive

cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations

Author: Erofeev Gleb
Gladkoff Serge
Han Lifeng
Sorokina Irina
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 20/08/2021
Field of study

Human evaluation has always been expensive while researchers struggle to trust the automatic metrics. To address this, we propose to customise traditional metrics by taking advantages of the pre-trained language models (PLMs) and the limited available human labelled scores. We first re-introduce the hLEPOR metric factors, followed by the Python portable version we developed which achieved the automatic tuning of the weighting parameters in hLEPOR metric. Then we present the customised hLEPOR (cushLEPOR) which uses LABSE distilled knowledge model to improve the metric agreement with human judgements by automatically optimised factor weights regarding the exact MT language pairs that cushLEPOR is deployed to. We also optimise cushLEPOR towards human evaluation data based on MQM and pSQM framework on English-German and Chinese-English language pairs. The experimental investigations show cushLEPOR boosts hLEPOR performances towards better agreements to PLMs like LABSE with much lower cost, and better agreements to human evaluations including MQM and pSQM scores, and yields much better performances than BLEU (data available at \url{this https URL})

DCU Online Research Access Service

Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Author: Erofeev Gleb
Gladkoff Serge
Han Lifeng
Nenadic Goran
Sorokina Irina
Publication venue
Publication date: 01/01/2023
Field of study

Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks. This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning. We carry out an experimental investigation using Meta-AI's MMPLMs "wmt21-dense24-wide-en-X and X-en (WMT21fb)"which were pre-trained on 7 language pairs and 14 translation directions including English to Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese, and the opposite direction. We fine-tune these MMPLMs towards English-Spanish language pair which did not exist at all in their original pre-trained corpora both implicitly and explicitly. We prepare carefully aligned clinical domain data for this fine-tuning, which is different from their original mixed domain knowledge. Our experimental result shows that the fine-tuning is very successful using just 250k well-aligned in-domain ENES segments for three sub-task translation testings: clinical cases, clinical terms, and ontology concepts. It achieves very close evaluation scores to another MMPLM NLLB from Meta-AI, which included Spanish as a high-resource setting in the pre-training. To the best of our knowledge, this is the first work on using MMPLMs towards clinical domain transferlearning NMT successfully for totally unseen languages during pre-training.</p

The University of Manchester - Institutional Repository

Predictive Data Analytics with AI: assessing the need for post-editing of MT output by fine-tuning OpenAI LLMs

Author: Erofeev Gleb
Gladkoff Serge
Han Lifeng
Nenadic Goran
Sorokina Irina
Publication venue
Publication date: 01/11/2023
Field of study

The University of Manchester - Institutional Repository

Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning

Author: Erofeev Gleb
Galiano Betty
Gladkoff Serge
Han Lifeng
Nenadic Goran
Sorokina Irina
Publication venue
Publication date: 01/01/2024
Field of study

Clinical text and documents contain very rich information and knowledge in healthcare, and their processing using state-of-the-art language technology becomes very important for building intelligent systems for supporting healthcare and social good. This processing includes creating language understanding models and translating resources into other natural languages to share domain-specific cross-lingual knowledge. In this work, we conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. Furthermore, to address the language resource imbalance issue, we also carry out experiments using a transfer learning methodology based on massive multilingual pre-trained language models (MMPLMs). The experimental results on three sub-tasks including (1) clinical case (CC), (2) clinical terminology (CT), and (3) ontological concept (OC) show that our models achieved top-level performances in the ClinSpEn-2022 shared task on English-Spanish clinical domain data. Furthermore, our expert-based human evaluations demonstrate that the small-sized pre-trained language model (PLM) outperformed the other two extra-large language models by a large margin in the clinical domain fine-tuning, which finding was never reported in the field. Finally, the transfer learning method works well in our experimental setting using the WMT21fb model to accommodate a new language space Spanish that was not seen at the pre-training stage within WMT21fb itself, which deserves more exploitation for clinical knowledge transformation, e.g. to investigate into more languages. These research findings can shed some light on domain-specific machine translation development, especially in clinical and healthcare fields. Further research projects can be carried out based on our work to improve healthcare text analytics and knowledge transformation. Our data is openly available for research purposes at: https://github.com/HECTA-UoM/ClinicalNMT

Directory of Open Access Journals

The University of Manchester - Institutional Repository

10_6_vp_ml-2

Author: Dubrovin Evgeniy V.
Erofeev Alexander S.
Gorelkin Petr V.
Kiselev Gleb A.
Kolesov Dmitry V.
Yaminsky Igor V.
Publication venue
Publication date: 29/08/2019
Field of study

Raw data of the experiment after the first regeneration for the graph presented on Figure 1

Dryad Digital Repository (Duke University)

10_5_vp_ml-1

Author: Dubrovin Evgeniy V.
Erofeev Alexander S.
Gorelkin Petr V.
Kiselev Gleb A.
Kolesov Dmitry V.
Yaminsky Igor V.
Publication venue
Publication date: 29/08/2019
Field of study

Raw data of experiments with virus concentration 10^5 vp/m

Dryad Digital Repository (Duke University)

V6_106.009

Author: Dubrovin Evgeniy V.
Erofeev Alexander S.
Gorelkin Petr V.
Kiselev Gleb A.
Kolesov Dmitry V.
Yaminsky Igor V.
Publication venue
Publication date: 29/08/2019
Field of study

AFM image of virus particles partially dipped in the receptor layer on Figure 7

Dryad Digital Repository (Duke University)

Теоретические основы преемственности дошкольного и начального образования как проблема современного образования

Author: Dubrovin Evgeniy V.
Erofeev Alexander S.
Gorelkin Petr V.
Kiselev Gleb A.
Kolesov Dmitry V.
Yaminsky Igor V.
Publication venue: Електронне наукове фахове видання
Publication date: 01/01/2019
Field of study

У статті розглянуто теоретичні аспекти проблеми наступності між дошкільною й початковою освітою. З'ясовано проблеми та суперечності, пов’язані з реалізацією наступності під час переходу між дошкільною освітою та першим рівнем загальної середньої освіти. Проаналізовано різні підходи до визначення поняття "наступність". Визначено структуру наступності та виокремлено провідні напрями взаємодії і зв’язку змісту, програм, методів і форм навчання в закладі дошкільної освіти й початковій школі. Окреслено педагогічні вимоги та умови забезпечення наступності в підготовці та діяльності педагогів, здатних вирішувати науково-методичні та практичні завдання з інноваційного розвитку дошкільної й початкової освіти, пріоритетом яких є формування людини нової формації.The article deals with the theoretical aspects of the problem of continuity between pre-school and primary education. Problems and controversies related to the continuity of these important parts of education are described. Different approaches to the definition of "continuity" are analyzed. The structure of continuity is determined and the leading directions of interaction and communication of content, programs, methods and forms of education in the kindergarten and elementary school are distinguished. The pedagogical requirements and conditions for ensuring continuity in the preparation and activities of teachers who are able to solve very complex scientific-methodical and practical tasks concerning the innovative development of pre-school and primary education, the priority of which is the education of a person of a new formation, is analyzed.В статье рассмотрены теоретические аспекты проблемы преемственности между дошкольным и начальным образованием. Проанализированы различные подходы к определению понятия "преемственность". Определены основные направления взаимодействия и связи содержания, программ, методов и форм обучения в детском саду и начальной школе, составляющие подготовки педагогов, способных решать научно-методические и практические задачи по инновационному развитию дошкольного и начального образования, приоритетом которых является формирование человека новой формации

Borys Grinchenko Kyiv University Institutional repository

Dryad Digital Repository (Duke University)