Search CORE

1,279 research outputs found

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

Author: Li Muyu
Lin Junyang
Ren Xuancheng
Su Qi
Sun Xu
Publication venue
Publication date: 01/01/2018
Field of study

Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.Comment: To appear in EMNLP 201

arXiv.org e-Print Archive

Crossref

Action-Conditional Video Prediction using Deep Networks in Atari Games

Author: Guo Xiaoxiao
Lee Honglak
Lewis Richard
Oh Junhyuk
Singh Satinder
Publication venue
Publication date: 21/12/2015
Field of study

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability. We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.Comment: Published at NIPS 2015 (Advances in Neural Information Processing Systems 28

arXiv.org e-Print Archive

CiteSeerX

Methods for Interpreting and Understanding Deep Neural Networks

Author: Montavon Grégoire
Müller Klaus-Robert
Samek Wojciech
Publication venue: 'Elsevier BV'
Publication date: 24/06/2017
Field of study

This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.Comment: 14 pages, 10 figure

arXiv.org e-Print Archive

Fraunhofer-ePrints

MPG.PuRe

Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

Author: Li Wei
Su Qi
Zhang Zhiyuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/06/2020
Field of study

The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.Comment: Acceptted by NLPCC 201

arXiv.org e-Print Archive