784 research outputs found
Identifying Semantic Divergences in Parallel Text without Annotations
Recognizing that even correct translations are not always semantically
equivalent, we automatically detect meaning divergences in parallel sentence
pairs with a deep neural model of bilingual semantic similarity which can be
trained for any parallel corpus without any manual annotation. We show that our
semantic model detects divergences more accurately than models based on surface
features derived from word alignments, and that these divergences matter for
neural machine translation.Comment: Accepted as a full paper to NAACL 201
Enriching Biomedical Knowledge for Vietnamese Low-resource Language Through Large-Scale Translation
Biomedical data and benchmarks are highly valuable yet very limited in
low-resource languages other than English such as Vietnamese. In this paper, we
make use of a state-of-the-art translation model in English-Vietnamese to
translate and produce both pretrained as well as supervised data in the
biomedical domains. Thanks to such large-scale translation, we introduce
ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20
million translated abstracts from the high-quality public PubMed corpus.
ViPubMedT5 demonstrates state-of-the-art results on two different biomedical
benchmarks in summarization and acronym disambiguation. Further, we release
ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the
recently public En-vi translation model and carefully refined by human experts,
with evaluations of existing methods against ViPubmedT5
A Hybrid Optimization Approach for Neural Machine Translation Using LSTM+RNN with MFO for Under Resource Language (Telugu)
NMT (Neural Machine Translation) is an innovative approach in the field of machine translation, in contrast to SMT (statistical machine translation) and Rule-based techniques which has resulted annotable improvements. This is because NMT is able to overcome many of the shortcomings that are inherent in the traditional approaches. The Development of NMT has grown tremendously in the recent years but NMT performance remain under optimal when applied to low resource language pairs like Telugu, Tamil and Hindi. In this work a proposedmethod fortranslating pairs (Telugu to English) is attempted, an optimal approach which enhancesthe accuracy and execution time period.A hybrid method approach utilizing Long short-term memory (LSTM) and traditional Recurrent Neural Network (RNN) are used for testing and training of the dataset. In the event of long-range dependencies, LSTM will generate more accurate results than a standard RNN would endure and the hybrid technique enhances the performance of LSTM. LSTM is used during the encoding and RNN is used in decoding phases of NMT. Moth Flame Optimization (MFO) is utilized in the proposed system for the purpose of providing the encoder and decoder model with the best ideal points for training the data
- …