180 research outputs found
Joint Training for Neural Machine Translation Models with Monolingual Data
Monolingual data have been demonstrated to be helpful in improving
translation quality of both statistical machine translation (SMT) systems and
neural machine translation (NMT) systems, especially in resource-poor or domain
adaptation tasks where parallel data are not rich enough. In this paper, we
propose a novel approach to better leveraging monolingual data for neural
machine translation by jointly learning source-to-target and target-to-source
NMT models for a language pair with a joint EM optimization method. The
training process starts with two initial NMT models pre-trained on parallel
data for each direction, and these two models are iteratively updated by
incrementally decreasing translation losses on training data. In each iteration
step, both NMT models are first used to translate monolingual data from one
language to the other, forming pseudo-training data of the other NMT model.
Then two new NMT models are learnt from parallel data together with the pseudo
training data. Both NMT models are expected to be improved and better
pseudo-training data can be generated in next step. Experiment results on
Chinese-English and English-German translation tasks show that our approach can
simultaneously improve translation quality of source-to-target and
target-to-source models, significantly outperforming strong baseline systems
which are enhanced with monolingual data for model training including
back-translation.Comment: Accepted by AAAI 201
Agent Behavior Prediction and Its Generalization Analysis
Machine learning algorithms have been applied to predict agent behaviors in
real-world dynamic systems, such as advertiser behaviors in sponsored search
and worker behaviors in crowdsourcing. The behavior data in these systems are
generated by live agents: once the systems change due to the adoption of the
prediction models learnt from the behavior data, agents will observe and
respond to these changes by changing their own behaviors accordingly. As a
result, the behavior data will evolve and will not be identically and
independently distributed, posing great challenges to the theoretical analysis
on the machine learning algorithms for behavior prediction. To tackle this
challenge, in this paper, we propose to use Markov Chain in Random Environments
(MCRE) to describe the behavior data, and perform generalization analysis of
the machine learning algorithms on its basis. Since the one-step transition
probability matrix of MCRE depends on both previous states and the random
environment, conventional techniques for generalization analysis cannot be
directly applied. To address this issue, we propose a novel technique that
transforms the original MCRE into a higher-dimensional time-homogeneous Markov
chain. The new Markov chain involves more variables but is more regular, and
thus easier to deal with. We prove the convergence of the new Markov chain when
time approaches infinity. Then we prove a generalization bound for the machine
learning algorithms on the behavior data generated by the new Markov chain,
which depends on both the Markovian parameters and the covering number of the
function class compounded by the loss function for behavior prediction and the
behavior prediction model. To the best of our knowledge, this is the first work
that performs the generalization analysis on data generated by complex
processes in real-world dynamic systems
- …