Enriching Biomedical Knowledge for Vietnamese Low-resource Language
  Through Large-Scale Translation

Chau, Lam D.; Dang, Tai; Phan, Long; Phan, Vy; Tran, Hieu; Trinh, Trieu H.

Enriching Biomedical Knowledge for Vietnamese Low-resource Language Through Large-Scale Translation

Authors: Lam D. Chau
Tai Dang
Long Phan
Vy Phan
Hieu Tran
Trieu H. Trinh
Publication date: 26 October 2022
Publisher

Abstract

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2210.05598

Last time updated on 06/12/2022