28,511 research outputs found
Dense Information Flow for Neural Machine Translation
Recently, neural machine translation has achieved remarkable progress by
introducing well-designed deep neural networks into its encoder-decoder
framework. From the optimization perspective, residual connections are adopted
to improve learning performance for both encoder and decoder in most of these
deep architectures, and advanced attention connections are applied as well.
Inspired by the success of the DenseNet model in computer vision problems, in
this paper, we propose a densely connected NMT architecture (DenseNMT) that is
able to train more efficiently for NMT. The proposed DenseNMT not only allows
dense connection in creating new features for both encoder and decoder, but
also uses the dense attention structure to improve attention quality. Our
experiments on multiple datasets show that DenseNMT structure is more
competitive and efficient
Compressing Recurrent Neural Network with Tensor Train
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and
sequential tasks and achieve many state-of-the-art performance on various
complex problems. However, most of the state-of-the-art RNNs have millions of
parameters and require many computational resources for training and predicting
new data. This paper proposes an alternative RNN model to reduce the number of
parameters significantly by representing the weight parameters based on Tensor
Train (TT) format. In this paper, we implement the TT-format representation for
several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We
compare and evaluate our proposed RNN model with uncompressed RNN model on
sequence classification and sequence prediction tasks. Our proposed RNNs with
TT-format are able to preserve the performance while reducing the number of RNN
parameters significantly up to 40 times smaller.Comment: Accepted at IJCNN 201
- …