143 research outputs found
Learning Hard Alignments with Variational Inference
There has recently been significant interest in hard attention models for
tasks such as object recognition, visual captioning and speech recognition.
Hard attention can offer benefits over soft attention such as decreased
computational cost, but training hard attention models can be difficult because
of the discrete latent variables they introduce. Previous work used REINFORCE
and Q-learning to approach these issues, but those methods can provide
high-variance gradient estimates and be slow to train. In this paper, we tackle
the problem of learning hard attention for a sequential task using variational
inference methods, specifically the recently introduced VIMCO and NVIL.
Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We
demonstrate our method on a phoneme recognition task in clean and noisy
environments and show that our method outperforms REINFORCE, with the
difference being greater for a more complicated task
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
While large language models have proven effective in a huge range of
downstream applications, they often generate text that is problematic or lacks
a desired attribute. In this paper, we introduce Reward-Augmented Decoding
(RAD), a text generation procedure that uses a small unidirectional reward
model to encourage a language model to generate text that has certain
properties. Specifically, RAD uses the reward model to score generations as
they are produced and rescales sampling probabilities to favor high-reward
tokens. By using a unidirectional reward model, RAD can cache activations from
prior generation steps to decrease computational overhead. Through experiments
on generating non-toxic and sentiment-controlled text, we demonstrate that RAD
performs best among methods that change only the generation procedure and
matches the performance of state-of-the-art methods that involve re-training
the language model. We further validate that RAD is effective on very large
language models while incurring a minimal computational overhead
- …
