59,872 research outputs found
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Machine translation is a natural candidate problem for reinforcement learning
from human feedback: users provide quick, dirty ratings on candidate
translations to guide a system to improve. Yet, current neural machine
translation training focuses on expensive human-generated reference
translations. We describe a reinforcement learning algorithm that improves
neural machine translation systems from simulated human feedback. Our algorithm
combines the advantage actor-critic algorithm (Mnih et al., 2016) with the
attention-based neural encoder-decoder architecture (Luong et al., 2015). This
algorithm (a) is well-designed for problems with a large action space and
delayed rewards, (b) effectively optimizes traditional corpus-level machine
translation metrics, and (c) is robust to skewed, high-variance, granular
feedback modeled after actual human behaviors.Comment: 11 pages, 5 figures, In Proceedings of Empirical Methods in Natural
Language Processing (EMNLP) 201
Conditional Random Field Autoencoders for Unsupervised Structured Prediction
We introduce a framework for unsupervised learning of structured predictors
with overlapping, global features. Each input's latent representation is
predicted conditional on the observable data using a feature-rich conditional
random field. Then a reconstruction of the input is (re)generated, conditional
on the latent structure, using models for which maximum likelihood estimation
has a closed-form. Our autoencoder formulation enables efficient learning
without making unrealistic independence assumptions or restricting the kinds of
features that can be used. We illustrate insightful connections to traditional
autoencoders, posterior regularization and multi-view learning. We show
competitive results with instantiations of the model for two canonical NLP
tasks: part-of-speech induction and bitext word alignment, and show that
training our model can be substantially more efficient than comparable
feature-rich baselines
Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation
Pairwise ranking methods are the basis of many widely used discriminative
training approaches for structure prediction problems in natural language
processing(NLP). Decomposing the problem of ranking hypotheses into pairwise
comparisons enables simple and efficient solutions. However, neglecting the
global ordering of the hypothesis list may hinder learning. We propose a
listwise learning framework for structure prediction problems such as machine
translation. Our framework directly models the entire translation list's
ordering to learn parameters which may better fit the given listwise samples.
Furthermore, we propose top-rank enhanced loss functions, which are more
sensitive to ranking errors at higher positions. Experiments on a large-scale
Chinese-English translation task show that both our listwise learning framework
and top-rank enhanced listwise losses lead to significant improvements in
translation quality.Comment: Accepted to CONLL 201
- …