12 research outputs found
Amortized Noisy Channel Neural Machine Translation
Noisy channel models have been especially effective in neural machine
translation (NMT). However, recent approaches like "beam search and rerank"
(BSR) incur significant computation overhead during inference, making
real-world application infeasible. We aim to study if it is possible to build
an amortized noisy channel NMT model such that when we do greedy decoding
during inference, the translation accuracy matches that of BSR in terms of
reward (based on the source-to-target log probability and the target-to-source
log probability) and quality (based on BLEU and BLEURT). We attempt three
approaches to train the new model: knowledge distillation, one-step-deviation
imitation learning, and Q learning. The first approach obtains the noisy
channel signal from a pseudo-corpus, and the latter two approaches aim to
optimize toward a noisy-channel MT reward directly. For all three approaches,
the generated translations fail to achieve rewards comparable to BSR, but the
translation quality approximated by BLEU and BLEURT is similar to the quality
of BSR-produced translations. Additionally, all three approaches speed up
inference by 1-2 orders of magnitude.Comment: INLG 202
Leveraging Implicit Feedback from Deployment Data in Dialogue
We study improving social conversational agents by learning from natural
dialogue between users and a deployed model, without extra annotations. To
implicitly measure the quality of a machine-generated utterance, we leverage
signals like user response length, sentiment and reaction of the future human
utterances in the collected dialogue episodes. Our experiments use the publicly
released deployment data from BlenderBot (Xu et al., 2023). Human evaluation
indicates improvements in our new models over baseline responses; however, we
find that some proxy signals can lead to more generations with undesirable
properties as well. For example, optimizing for conversation length can lead to
more controversial or unfriendly generations compared to the baseline, whereas
optimizing for positive sentiment or reaction can decrease these behaviors.Comment: EACL 202
Reward Gaming in Conditional Text Generation
To align conditional text generation model outputs with desired behaviors,
there has been an increasing focus on training the model using reinforcement
learning (RL) with reward functions learned from human annotations. Under this
framework, we identify three common cases where high rewards are incorrectly
assigned to undesirable patterns: noise-induced spurious correlation, naturally
occurring spurious correlation, and covariate shift. We show that even though
learned metrics achieve high performance on the distribution of the data used
to train the reward function, the undesirable patterns may be amplified during
RL training of the text generation model. While there has been discussion about
reward gaming in the RL or safety community, in this discussion piece, we would
like to highlight reward gaming in the natural language generation (NLG)
community using concrete conditional text generation examples and discuss
potential fixes and areas for future work
Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples
Given the intractably large size of the space of proofs, any model that is
capable of general deductive reasoning must generalize to proofs of greater
complexity. Recent studies have shown that large language models (LLMs) possess
some abstract deductive reasoning ability given chain-of-thought prompts.
However, they have primarily been tested on proofs using modus ponens or of a
specific size, and from the same distribution as the in-context examples. To
measure the general deductive reasoning ability of LLMs, we test on a broad set
of deduction rules and measure their ability to generalize to more complex
proofs from simpler demonstrations from multiple angles: depth-, width-, and
compositional generalization. To facilitate systematic exploration, we
construct a new synthetic and programmable reasoning dataset that enables
control over deduction rules and proof complexity. Our experiments on four LLMs
of various sizes and training objectives show that they are able to generalize
to longer and compositional proofs. However, they require explicit
demonstrations to produce hypothetical subproofs, specifically in proof by
cases and proof by contradiction