22,726 research outputs found
Learning Semantic Representations for the Phrase Translation Model
This paper presents a novel semantic-based phrase translation model. A pair
of source and target phrases are projected into continuous-valued vector
representations in a low-dimensional latent semantic space, where their
translation score is computed by the distance between the pair in this new
space. The projection is performed by a multi-layer neural network whose
weights are learned on parallel training data. The learning is aimed to
directly optimize the quality of end-to-end machine translation results.
Experimental evaluation has been performed on two Europarl translation tasks,
English-French and German-English. The results show that the new semantic-based
phrase translation model significantly improves the performance of a
state-of-the-art phrase-based statistical machine translation sys-tem, leading
to a gain of 0.7-1.0 BLEU points
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
We present a novel response generation system that can be trained end to end
on large quantities of unstructured Twitter conversations. A neural network
architecture is used to address sparsity issues that arise when integrating
contextual information into classic statistical models, allowing the system to
take into account previous dialog utterances. Our dynamic-context generative
models show consistent gains over both context-sensitive and
non-context-sensitive Machine Translation and Information Retrieval baselines.Comment: A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell,
J.-Y. Nie, J. Gao, B. Dolan. 2015. A Neural Network Approach to
Context-Sensitive Generation of Conversational Responses. In Proc. of
NAACL-HLT. Pages 196-20
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
In this paper, we propose a novel neural network model called RNN
Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN
encodes a sequence of symbols into a fixed-length vector representation, and
the other decodes the representation into another sequence of symbols. The
encoder and decoder of the proposed model are jointly trained to maximize the
conditional probability of a target sequence given a source sequence. The
performance of a statistical machine translation system is empirically found to
improve by using the conditional probabilities of phrase pairs computed by the
RNN Encoder-Decoder as an additional feature in the existing log-linear model.
Qualitatively, we show that the proposed model learns a semantically and
syntactically meaningful representation of linguistic phrases.Comment: EMNLP 201
Compositional Morphology for Word Representations and Language Modelling
This paper presents a scalable method for integrating compositional
morphological representations into a vector-based probabilistic language model.
Our approach is evaluated in the context of log-bilinear language models,
rendered suitably efficient for implementation inside a machine translation
decoder by factoring the vocabulary. We perform both intrinsic and extrinsic
evaluations, presenting results on a range of languages which demonstrate that
our model learns morphological representations that both perform well on word
similarity tasks and lead to substantial reductions in perplexity. When used
for translation into morphologically rich languages with large vocabularies,
our models obtain improvements of up to 1.2 BLEU points relative to a baseline
system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning
(ICML
Self-Adaptive Hierarchical Sentence Model
The ability to accurately model a sentence at varying stages (e.g.,
word-phrase-sentence) plays a central role in natural language processing. As
an effort towards this goal we propose a self-adaptive hierarchical sentence
model (AdaSent). AdaSent effectively forms a hierarchy of representations from
words to phrases and then to sentences through recursive gated local
composition of adjacent segments. We design a competitive mechanism (through
gating networks) to allow the representations of the same sentence to be
engaged in a particular learning task (e.g., classification), therefore
effectively mitigating the gradient vanishing problem persistent in other
recursive models. Both qualitative and quantitative analysis shows that AdaSent
can automatically form and select the representations suitable for the task at
hand during training, yielding superior classification performance over
competitor models on 5 benchmark data sets.Comment: 8 pages, 7 figures, accepted as a full paper at IJCAI 201
- …