784 research outputs found

    Identifying Semantic Divergences in Parallel Text without Annotations

    Full text link
    Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.Comment: Accepted as a full paper to NAACL 201

    Improving Statistical Machine Translation with Processing Shallow Parsing

    Get PDF

    High-level methodologies for grammar engineering, introduction to the special issue

    Full text link

    Enriching Biomedical Knowledge for Vietnamese Low-resource Language Through Large-Scale Translation

    Full text link
    Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5

    A Hybrid Optimization Approach for Neural Machine Translation Using LSTM+RNN with MFO for Under Resource Language (Telugu)

    Get PDF
    NMT (Neural Machine Translation) is an innovative approach in the field of machine translation, in contrast to SMT (statistical machine translation) and Rule-based techniques which has resulted annotable improvements. This is because NMT is able to overcome many of the shortcomings that are inherent in the traditional approaches. The Development of NMT has grown tremendously in the recent years but NMT performance remain under optimal when applied to low resource language pairs like Telugu, Tamil and Hindi. In this work a proposedmethod fortranslating pairs (Telugu to English) is attempted, an optimal approach which enhancesthe accuracy and execution time period.A hybrid method approach utilizing Long short-term memory (LSTM) and traditional Recurrent Neural Network (RNN) are used for testing and training of the dataset. In the event of long-range dependencies, LSTM will generate more accurate results than a standard RNN would endure and the hybrid technique enhances the performance of LSTM. LSTM is used during the encoding and RNN is used in decoding phases of NMT. Moth Flame Optimization (MFO) is utilized in the proposed system for the purpose of providing the encoder and decoder model with the best ideal points for training the data
    • …
    corecore