Search CORE

784 research outputs found

Identifying Semantic Divergences in Parallel Text without Annotations

Author: Carpuat Marine
Niu Xing
Vyas Yogarshi
Publication venue
Publication date: 01/01/2018
Field of study

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201

arXiv.org e-Print Archive

Crossref

Improving Statistical Machine Translation with Processing Shallow Parsing

Author: Nguyen Vinh Van
Shimazu Akira
Tran Viet Hong
Vuong Hoai-Thu
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

High-level methodologies for grammar engineering, introduction to the special issue

Author
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date
Field of study

Crossref

Enriching Biomedical Knowledge for Vietnamese Low-resource Language Through Large-Scale Translation

Author: Chau Lam D.
Dang Tai
Phan Long
Phan Vy
Tran Hieu
Trinh Trieu H.
Publication venue
Publication date: 26/10/2022
Field of study

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5

arXiv.org e-Print Archive

A Hybrid Optimization Approach for Neural Machine Translation Using LSTM+RNN with MFO for Under Resource Language (Telugu)

Author: Bhaskari D. Lalitha
Garugu Srisudha
Publication venue: Auricle Global Society of Education and Research
Publication date: 01/09/2023
Field of study

NMT (Neural Machine Translation) is an innovative approach in the field of machine translation, in contrast to SMT (statistical machine translation) and Rule-based techniques which has resulted annotable improvements. This is because NMT is able to overcome many of the shortcomings that are inherent in the traditional approaches. The Development of NMT has grown tremendously in the recent years but NMT performance remain under optimal when applied to low resource language pairs like Telugu, Tamil and Hindi. In this work a proposedmethod fortranslating pairs (Telugu to English) is attempted, an optimal approach which enhancesthe accuracy and execution time period.A hybrid method approach utilizing Long short-term memory (LSTM) and traditional Recurrent Neural Network (RNN) are used for testing and training of the dataset. In the event of long-range dependencies, LSTM will generate more accurate results than a standard RNN would endure and the hybrid technique enhances the performance of LSTM. LSTM is used during the encoding and RNN is used in decoding phases of NMT. Moth Flame Optimization (MFO) is utilized in the proposed system for the purpose of providing the encoder and decoder model with the best ideal points for training the data

International Journal on Recent and Innovation Trends in Computing and Communication