312 research outputs found
Self-Adaptive Hierarchical Sentence Model
The ability to accurately model a sentence at varying stages (e.g.,
word-phrase-sentence) plays a central role in natural language processing. As
an effort towards this goal we propose a self-adaptive hierarchical sentence
model (AdaSent). AdaSent effectively forms a hierarchy of representations from
words to phrases and then to sentences through recursive gated local
composition of adjacent segments. We design a competitive mechanism (through
gating networks) to allow the representations of the same sentence to be
engaged in a particular learning task (e.g., classification), therefore
effectively mitigating the gradient vanishing problem persistent in other
recursive models. Both qualitative and quantitative analysis shows that AdaSent
can automatically form and select the representations suitable for the task at
hand during training, yielding superior classification performance over
competitor models on 5 benchmark data sets.Comment: 8 pages, 7 figures, accepted as a full paper at IJCAI 201
Neural Responding Machine for Short-Text Conversation
We propose Neural Responding Machine (NRM), a neural network-based response
generator for Short-Text Conversation. NRM takes the general encoder-decoder
framework: it formalizes the generation of response as a decoding process based
on the latent representation of the input text, while both encoding and
decoding are realized with recurrent neural networks (RNN). The NRM is trained
with a large amount of one-round conversation data collected from a
microblogging service. Empirical study shows that NRM can generate
grammatically correct and content-wise appropriate responses to over 75% of the
input text, outperforming state-of-the-arts in the same setting, including
retrieval-based and SMT-based models.Comment: accepted as a full paper at ACL 201
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art
Neural Machine Translation (NMT) with their capability in modeling complex
functions and capturing complex linguistic structures. However NMT systems with
deep architecture in their encoder or decoder RNNs often suffer from severe
gradient diffusion due to the non-linear recurrent activations, which often
make the optimization much more difficult. To address this problem we propose
novel linear associative units (LAU) to reduce the gradient propagation length
inside the recurrent unit. Different from conventional approaches (LSTM unit
and GRU), LAUs utilizes linear associative connections between input and output
of the recurrent unit, which allows unimpeded information flow through both
space and time direction. The model is quite simple, but it is surprisingly
effective. Our empirical study on Chinese-English translation shows that our
model with proper configuration can improve by 11.7 BLEU upon Groundhog and the
best reported results in the same setting. On WMT14 English-German task and a
larger WMT14 English-French task, our model achieves comparable results with
the state-of-the-art.Comment: 10 pages, ACL 201
Memory-enhanced Decoder for Neural Machine Translation
We propose to enhance the RNN decoder in a neural machine translator (NMT)
with external memory, as a natural but powerful extension to the state in the
decoding RNN. This memory-enhanced RNN decoder is called \textsc{MemDec}. At
each time during decoding, \textsc{MemDec} will read from this memory and write
to this memory once, both with content-based addressing. Unlike the unbounded
memory in previous work\cite{RNNsearch} to store the representation of source
sentence, the memory in \textsc{MemDec} is a matrix with pre-determined size
designed to better capture the information important for the decoding process
at each time step. Our empirical study on Chinese-English translation shows
that it can improve by BLEU upon Groundhog and BLEU upon on Moses,
yielding the best performance achieved with the same training set.Comment: 11 page
- …
