Search CORE

389 research outputs found

Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

Author: Chao Lidia S.
Wong Derek F.
Xiao Tong
Yang Baosong
Zhu Jingbo
Publication venue
Publication date: 01/01/2017
Field of study

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.Comment: Accepted for publication at EMNLP 201

arXiv.org e-Print Archive

Crossref

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Author: Han Yuchen
Xiao Tong
Xu Chen
Zhu Jingbo
Publication venue
Publication date: 13/06/2023
Field of study

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.Comment: ACL 2023 Main Conferenc

arXiv.org e-Print Archive