431 research outputs found
Text Generation with Efficient (Soft) Q-Learning
Maximum likelihood estimation (MLE) is the predominant algorithm for training
text generation models. This paradigm relies on direct supervision examples,
which is not applicable to many applications, such as generating adversarial
attacks or generating prompts to control language models. Reinforcement
learning (RL) on the other hand offers a more flexible solution by allowing
users to plug in arbitrary task metrics as reward. Yet previous RL algorithms
for text generation, such as policy gradient (on-policy RL) and Q-learning
(off-policy RL), are often notoriously inefficient or unstable to train due to
the large sequence space and the sparse reward received only at the end of
sequences. In this paper, we introduce a new RL formulation for text generation
from the soft Q-learning perspective. It further enables us to draw from the
latest RL advances, such as path consistency learning, to combine the best of
on-/off-policy updates, and learn effectively from sparse reward. We apply the
approach to a wide range of tasks, including learning from noisy/negative
examples, adversarial attacks, and prompt generation. Experiments show our
approach consistently outperforms both task-specialized algorithms and the
previous RL methods. On standard supervised tasks where MLE prevails, our
approach also achieves competitive performance and stability by training text
generation from scratch.Comment: Code available at
https://github.com/HanGuo97/soft-Q-learning-for-text-generatio
Improving Search through A3C Reinforcement Learning based Conversational Agent
We develop a reinforcement learning based search assistant which can assist
users through a set of actions and sequence of interactions to enable them
realize their intent. Our approach caters to subjective search where the user
is seeking digital assets such as images which is fundamentally different from
the tasks which have objective and limited search modalities. Labeled
conversational data is generally not available in such search tasks and
training the agent through human interactions can be time consuming. We propose
a stochastic virtual user which impersonates a real user and can be used to
sample user behavior efficiently to train the agent which accelerates the
bootstrapping of the agent. We develop A3C algorithm based context preserving
architecture which enables the agent to provide contextual assistance to the
user. We compare the A3C agent with Q-learning and evaluate its performance on
average rewards and state values it obtains with the virtual user in validation
episodes. Our experiments show that the agent learns to achieve higher rewards
and better states.Comment: 17 pages, 7 figure
Transferable Multi-Agent Reinforcement Learning with Dynamic Participating Agents
We study multi-agent reinforcement learning (MARL) with centralized training
and decentralized execution. During the training, new agents may join, and
existing agents may unexpectedly leave the training. In such situations, a
standard deep MARL model must be trained again from scratch, which is very
time-consuming. To tackle this problem, we propose a special network
architecture with a few-shot learning algorithm that allows the number of
agents to vary during centralized training. In particular, when a new agent
joins the centralized training, our few-shot learning algorithm trains its
policy network and value network using a small number of samples; when an agent
leaves the training, the training process of the remaining agents is not
affected. Our experiments show that using the proposed network architecture and
algorithm, model adaptation when new agents join can be 100+ times faster than
the baseline. Our work is applicable to any setting, including cooperative,
competitive, and mixed.Comment: 10 pages, 7 figure
- …