906 research outputs found
Controllable Neural Story Plot Generation via Reinforcement Learning
Language-modeling--based approaches to story plot generation attempt to
construct a plot by sampling from a language model (LM) to predict the next
character, word, or sentence to add to the story. LM techniques lack the
ability to receive guidance from the user to achieve a specific goal, resulting
in stories that don't have a clear sense of progression and lack coherence. We
present a reward-shaping technique that analyzes a story corpus and produces
intermediate rewards that are backpropagated into a pre-trained LM in order to
guide the model towards a given goal. Automated evaluations show our technique
can create a model that generates story plots which consistently achieve a
specified goal. Human-subject studies show that the generated stories have more
plausible event ordering than baseline plot generation techniques.Comment: Published in IJCAI 201
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence
Conditional story generation is significant in human-machine interaction,
particularly in producing stories with complex plots. While Large language
models (LLMs) perform well on multiple NLP tasks, including story generation,
it is challenging to generate stories with both complex and creative plots.
Existing methods often rely on detailed prompts to guide LLMs to meet target
conditions, which inadvertently restrict the creative potential of the
generated stories. We argue that leveraging information from exemplary
human-written stories facilitates generating more diverse plotlines. Delving
deeper into story details helps build complex and credible plots. In this
paper, we propose a retrieval-au\textbf{G}mented sto\textbf{R}y generation
framework with a f\textbf{O}rest of e\textbf{V}id\textbf{E}nce (GROVE) to
enhance stories' complexity. We build a retrieval repository for target
conditions to produce few-shot examples to prompt LLMs. Additionally, we design
an ``asking-why'' prompting scheme that extracts a forest of evidence,
providing compensation for the ambiguities that may occur in the generated
story. This iterative process uncovers underlying story backgrounds. Finally,
we select the most fitting chains of evidence from the evidence forest and
integrate them into the generated story, thereby enhancing the narrative's
complexity and credibility. Experimental results and numerous examples verify
the effectiveness of our method.Comment: Findings of EMNLP 202
Automated Storytelling via Causal, Commonsense Plot Ordering
Automated story plot generation is the task of generating a coherent sequence
of plot events. Causal relations between plot events are believed to increase
the perception of story and plot coherence. In this work, we introduce the
concept of soft causal relations as causal relations inferred from commonsense
reasoning. We demonstrate C2PO, an approach to narrative generation that
operationalizes this concept through Causal, Commonsense Plot Ordering. Using
human-participant protocols, we evaluate our system against baseline systems
with different commonsense reasoning reasoning and inductive biases to
determine the role of soft causal relations in perceived story quality. Through
these studies we also probe the interplay of how changes in commonsense norms
across storytelling genres affect perceptions of story quality.Comment: AAAI-21 Camera Ready Versio
CoRRPUS: Codex-Leveraged Structured Representations for Neurosymbolic Story Understanding
Story generation and understanding -- as with all NLG/NLU tasks -- has seen a
surge in neurosymbolic work. Researchers have recognized that, while large
language models (LLMs) have tremendous utility, they can be augmented with
symbolic means to be even better and to make up for any flaws that the neural
networks might have. However, symbolic methods are extremely costly in terms of
the amount of time and expertise needed to create them. In this work, we
capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use
of symbolic methods for tracking the state of stories and aiding in story
understanding. We show that our CoRRPUS system and abstracted prompting
procedures can beat current state-of-the-art structured LLM techniques on
pre-existing story understanding tasks (bAbI task 2 and Re^3) with minimal hand
engineering. We hope that this work can help highlight the importance of
symbolic representations and specialized prompting for LLMs as these models
require some guidance for performing reasoning tasks properly.Comment: Accepted to Findings of ACL 202
GRD: A Generative Approach for Interpretable Reward Redistribution in Reinforcement Learning
A major challenge in reinforcement learning is to determine which
state-action pairs are responsible for future rewards that are delayed. Return
Decomposition offers a solution by redistributing rewards from observed
sequences while preserving policy invariance. While the majority of current
approaches construct the reward redistribution in an uninterpretable manner, we
propose to explicitly model the contributions of state and action from a causal
perspective, resulting in an interpretable return decomposition. In this paper,
we start by studying the role of causal generative models in return
decomposition by characterizing the generation of Markovian rewards and
trajectory-wise long-term return and further propose a framework, called
Generative Return Decomposition (GRD), for policy optimization in delayed
reward scenarios. Specifically, GRD first identifies the unobservable Markovian
rewards and causal relations in the generative process. Then, GRD makes use of
the identified causal generative model to form a compact representation to
train policy over the most favorable subspace of the state space of the agent.
Theoretically, we show that the unobservable Markovian reward function is
identifiable, as well as the underlying causal structure and causal models.
Experimental results show that our method outperforms state-of-the-art methods
and the provided visualization further demonstrates the interpretability of our
method
SuperHF: Supervised Iterative Learning from Human Feedback
While large language models demonstrate remarkable capabilities, they often
present challenges in terms of safety, alignment with human values, and
stability during training. Here, we focus on two prevalent methods used to
align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning
from Human Feedback (RLHF). SFT is simple and robust, powering a host of
open-source models, while RLHF is a more sophisticated method used in top-tier
models like ChatGPT but also suffers from instability and susceptibility to
reward hacking. We propose a novel approach, Supervised Iterative Learning from
Human Feedback (SuperHF), which seeks to leverage the strengths of both
methods. Our hypothesis is two-fold: that the reward model used in RLHF is
critical for efficient data use and model generalization and that the use of
Proximal Policy Optimization (PPO) in RLHF may not be necessary and could
contribute to instability issues. SuperHF replaces PPO with a simple supervised
loss and a Kullback-Leibler (KL) divergence prior. It creates its own training
data by repeatedly sampling a batch of model outputs and filtering them through
the reward model in an online learning regime. We then break down the reward
optimization problem into three components: robustly optimizing the training
rewards themselves, preventing reward hacking-exploitation of the reward model
that degrades model performance-as measured by a novel METEOR similarity
metric, and maintaining good performance on downstream evaluations. Our
experimental results show SuperHF exceeds PPO-based RLHF on the training
objective, easily and favorably trades off high reward with low reward hacking,
improves downstream calibration, and performs the same on our GPT-4 based
qualitative evaluation scheme all the while being significantly simpler to
implement, highlighting SuperHF's potential as a competitive language model
alignment technique.Comment: Accepted to the Socially Responsible Language Modelling Research
(SoLaR) workshop at NeurIPS 202
- …