92 research outputs found
Translating Phrases in Neural Machine Translation
Phrases play an important role in natural language understanding and machine
translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is
difficult to integrate them into current neural machine translation (NMT) which
reads and generates sentences word by word. In this work, we propose a method
to translate phrases in NMT by integrating a phrase memory storing target
phrases from a phrase-based statistical machine translation (SMT) system into
the encoder-decoder architecture of NMT. At each decoding step, the phrase
memory is first re-written by the SMT model, which dynamically generates
relevant target phrases with contextual information provided by the NMT model.
Then the proposed model reads the phrase memory to make probability estimations
for all phrases in the phrase memory. If phrase generation is carried on, the
NMT decoder selects an appropriate phrase from the memory to perform phrase
translation and updates its decoding state by consuming the words in the
selected phrase. Otherwise, the NMT decoder generates a word from the
vocabulary as the general NMT decoder does. Experiment results on the Chinese
to English translation show that the proposed model achieves significant
improvements over the baseline on various test sets.Comment: Accepted by EMNLP 201
Chunk-Based Bi-Scale Decoder for Neural Machine Translation
In typical neural machine translation~(NMT), the decoder generates a sentence
word by word, packing all linguistic granularities in the same time-scale of
RNN. In this paper, we propose a new type of decoder for NMT, which splits the
decode state into two parts and updates them in two different time-scales.
Specifically, we first predict a chunk time-scale state for phrasal modeling,
on top of which multiple word time-scale states are generated. In this way, the
target sentence is translated hierarchically from chunks to words, with
information in different granularities being leveraged. Experiments show that
our proposed model significantly improves the translation performance over the
state-of-the-art NMT model.Comment: Accepted as a short paper by ACL 201
Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification
Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus
Revisiting the Markov Property for Machine Translation
In this paper, we re-examine the Markov property in the context of neural
machine translation. We design a Markov Autoregressive Transformer~(MAT) and
undertake a comprehensive assessment of its performance across four WMT
benchmarks. Our findings indicate that MAT with an order larger than 4 can
generate translations with quality on par with that of conventional
autoregressive transformers. In addition, counter-intuitively, we also find
that the advantages of utilizing a higher-order MAT do not specifically
contribute to the translation of longer sentences.Comment: EACL (Findings
- ā¦