107 research outputs found
DTMT: A Novel Deep Transition Architecture for Neural Machine Translation
Past years have witnessed rapid developments in Neural Machine Translation
(NMT). Most recently, with advanced modeling and training techniques, the
RNN-based NMT (RNMT) has shown its potential strength, even compared with the
well-known Transformer (self-attentional) model. Although the RNMT model can
possess very deep architectures through stacking layers, the transition depth
between consecutive hidden states along the sequential axis is still shallow.
In this paper, we further enhance the RNN-based NMT through increasing the
transition depth between consecutive hidden states and build a novel Deep
Transition RNN-based Architecture for Neural Machine Translation, named DTMT.
This model enhances the hidden-to-hidden transition with multiple non-linear
transformations, as well as maintains a linear transformation path throughout
this deep transition by the well-designed linear transformation mechanism to
alleviate the gradient vanishing problem. Experiments show that with the
specially designed deep transition modules, our DTMT can achieve remarkable
improvements on translation quality. Experimental results on Chinese->English
translation task show that DTMT can outperform the Transformer model by +2.09
BLEU points and achieve the best results ever reported in the same dataset. On
WMT14 English->German and English->French translation tasks, DTMT shows
superior quality to the state-of-the-art NMT systems, including the Transformer
and the RNMT+.Comment: Accepted at AAAI 2019. Code is available at:
https://github.com/fandongmeng/DTMT_InDe
CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME
Chinese Spelling Correction (CSC) is a task to detect and correct spelling
mistakes in texts. In fact, most of Chinese input is based on pinyin input
method, so the study of spelling errors in this process is more practical and
valuable. However, there is still no research dedicated to this essential
scenario. In this paper, we first present a Chinese Spelling Correction Dataset
for errors generated by pinyin IME (CSCD-IME), including 40,000 annotated
sentences from real posts of official media on Sina Weibo. Furthermore, we
propose a novel method to automatically construct large-scale and high-quality
pseudo data by simulating the input through pinyin IME. A series of analyses
and experiments on CSCD-IME show that spelling errors produced by pinyin IME
hold a particular distribution at pinyin level and semantic level and are
challenging enough. Meanwhile, our proposed pseudo-data construction method can
better fit this error distribution and improve the performance of CSC systems.
Finally, we provide a useful guide to using pseudo data, including the data
scale, the data source, and the training strategy
Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task
This paper introduces WeChat's participation in WMT 2022 shared biomedical
translation task on Chinese to English. Our systems are based on the
Transformer, and use several different Transformer structures to improve the
quality of translation. In our experiments, we employ data filtering, data
generation, several variants of Transformer, fine-tuning and model ensemble.
Our ChineseEnglish system, named Summer, achieves the highest BLEU score
among all submissions
TIM: Teaching Large Language Models to Translate with Comparison
Open-sourced large language models (LLMs) have demonstrated remarkable
efficacy in various tasks with instruction tuning. However, these models can
sometimes struggle with tasks that require more specialized knowledge such as
translation. One possible reason for such deficiency is that instruction tuning
aims to generate fluent and coherent text that continues from a given
instruction without being constrained by any task-specific requirements.
Moreover, it can be more challenging for tuning smaller LLMs with lower-quality
training data. To address this issue, we propose a novel framework using
examples in comparison to teach LLMs to learn translation. Our approach
involves presenting the model with examples of correct and incorrect
translations and using a preference loss to guide the model's learning. We
evaluate our method on WMT2022 test sets and show that it outperforms existing
methods. Our findings offer a new perspective on fine-tuning LLMs for
translation tasks and provide a promising solution for generating high-quality
translations. Please refer to Github for more details:
https://github.com/lemon0830/TIM
Digging Errors in NMT: Evaluating and Understanding Model Errors from Partial Hypothesis Space
Solid evaluation of neural machine translation (NMT) is key to its
understanding and improvement. Current evaluation of an NMT system is usually
built upon a heuristic decoding algorithm (e.g., beam search) and an evaluation
metric assessing similarity between the translation and golden reference.
However, this system-level evaluation framework is limited by evaluating only
one best hypothesis and search errors brought by heuristic decoding algorithms.
To better understand NMT models, we propose a novel evaluation protocol, which
defines model errors with model's ranking capability over hypothesis space. To
tackle the problem of exponentially large space, we propose two approximation
methods, top region evaluation along with an exact top- decoding algorithm,
which finds top-ranked hypotheses in the whole hypothesis space, and Monte
Carlo sampling evaluation, which simulates hypothesis space from a broader
perspective. To quantify errors, we define our NMT model errors by measuring
distance between the hypothesis array ranked by the model and the ideally
ranked hypothesis array. After confirming the strong correlation with human
judgment, we apply our evaluation to various NMT benchmarks and model
architectures. We show that the state-of-the-art Transformer models face
serious ranking issues and only perform at the random chance level in the top
region. We further analyze model errors on architectures with different depths
and widths, as well as different data-augmentation techniques, showing how
these factors affect model errors. Finally, we connect model errors with the
search algorithms and provide interesting findings of beam search inductive
bias and correlation with Minimum Bayes Risk (MBR) decoding.Comment: To be appeared as a main conference paper at EMNLP 202
Multi-Zone Unit for Recurrent Neural Networks
Recurrent neural networks (RNNs) have been widely used to deal with sequence
learning problems. The input-dependent transition function, which folds new
observations into hidden states to sequentially construct fixed-length
representations of arbitrary-length sequences, plays a critical role in RNNs.
Based on single space composition, transition functions in existing RNNs often
have difficulty in capturing complicated long-range dependencies. In this
paper, we introduce a new Multi-zone Unit (MZU) for RNNs. The key idea is to
design a transition function that is capable of modeling multiple space
composition. The MZU consists of three components: zone generation, zone
composition, and zone aggregation. Experimental results on multiple datasets of
the character-level language modeling task and the aspect-based sentiment
analysis task demonstrate the superiority of the MZU.Comment: Accepted at AAAI 202
- β¦