1,106 research outputs found
TVStoryGen: A Dataset for Generating Stories with Character Descriptions
We introduce TVStoryGen, a story generation dataset that requires generating
detailed TV show episode recaps from a brief summary and a set of documents
describing the characters involved. Unlike other story generation datasets,
TVStoryGen contains stories that are authored by professional screen-writers
and that feature complex interactions among multiple characters. Generating
stories in TVStoryGen requires drawing relevant information from the lengthy
provided documents about characters based on the brief summary. In addition, we
propose to train reverse models on our dataset for evaluating the faithfulness
of generated stories. We create TVStoryGen from fan-contributed websites, which
allows us to collect 26k episode recaps with 1868.7 tokens on average.
Empirically, we take a hierarchical story generation approach and find that the
neural model that uses oracle content selectors for character descriptions
demonstrates the best performance on automatic metrics, showing the potential
of our dataset to inspire future research on story generation with constraints.
Qualitative analysis shows that the best-performing model sometimes generates
content that is unfaithful to the short summaries, suggesting promising
directions for future work
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Narrative understanding involves capturing the author's cognitive processes,
providing insights into their knowledge, intentions, beliefs, and desires.
Although large language models (LLMs) excel in generating grammatically
coherent text, their ability to comprehend the author's thoughts remains
uncertain. This limitation hinders the practical applications of narrative
understanding. In this paper, we conduct a comprehensive survey of narrative
understanding tasks, thoroughly examining their key features, definitions,
taxonomy, associated datasets, training objectives, evaluation metrics, and
limitations. Furthermore, we explore the potential of expanding the
capabilities of modularized LLMs to address novel narrative understanding
tasks. By framing narrative understanding as the retrieval of the author's
imaginative cues that outline the narrative structure, our study introduces a
fresh perspective on enhancing narrative comprehension
Heroes, Villains, and the In-Between: A Natural Language Processing Approach to Fairy Tales
While great strides have been made with natural language processing (NLP) techniques in the last few decades, there has been a notable lack of research into utilizing NLP for the genre of fiction. This project seeks to address this gap by considering the use of NLP techniques for the summarization of European fairy tales. This subgenre of fiction is an appropriate starting point for investigation due to its archetypal characters and relatively simple story arcs. My approach is to extract the main characters of texts, along with key descriptors in the form of modifying adjectives and verbal actions the characters take part in. Through this method, I suggest how we may parse characters into Proppian archetypes by tracking their probabilistic association with certain linguistic occurrences. This classification schema in turn makes possible the broader classification of fairy tales into types. The model has an overall F1 score of 0.77, the individual parts having F1 scores of 0.89, 0.75, and 0.66 for character retrieval, adjective extraction, and verb extraction, respectively. This project may also be extended further, laying key groundwork for further automatization of categorization of characters and ultimately stories themselves
Generating Preview Tables for Entity Graphs
Users are tapping into massive, heterogeneous entity graphs for many
applications. It is challenging to select entity graphs for a particular need,
given abundant datasets from many sources and the oftentimes scarce information
for them. We propose methods to produce preview tables for compact presentation
of important entity types and relationships in entity graphs. The preview
tables assist users in attaining a quick and rough preview of the data. They
can be shown in a limited display space for a user to browse and explore,
before she decides to spend time and resources to fetch and investigate the
complete dataset. We formulate several optimization problems that look for
previews with the highest scores according to intuitive goodness measures,
under various constraints on preview size and distance between preview tables.
The optimization problem under distance constraint is NP-hard. We design a
dynamic-programming algorithm and an Apriori-style algorithm for finding
optimal previews. Results from experiments, comparison with related work and
user studies demonstrated the scoring measures' accuracy and the discovery
algorithms' efficiency.Comment: This is the camera-ready version of a SIGMOD16 paper. There might be
tiny differences in layout, spacing and linebreaking, compared with the
version in the SIGMOD16 proceedings, since we must submit TeX files and use
arXiv to compile the file
Open-world Story Generation with Structured Knowledge Enhancement: A Comprehensive Survey
Storytelling and narrative are fundamental to human experience, intertwined
with our social and cultural engagement. As such, researchers have long
attempted to create systems that can generate stories automatically. In recent
years, powered by deep learning and massive data resources, automatic story
generation has shown significant advances. However, considerable challenges,
like the need for global coherence in generated stories, still hamper
generative models from reaching the same storytelling ability as human
narrators. To tackle these challenges, many studies seek to inject structured
knowledge into the generation process, which is referred to as structure
knowledge-enhanced story generation. Incorporating external knowledge can
enhance the logical coherence among story events, achieve better knowledge
grounding, and alleviate over-generalization and repetition problems in
stories. This survey provides the latest and comprehensive review of this
research field: (i) we present a systematical taxonomy regarding how existing
methods integrate structured knowledge into story generation; (ii) we summarize
involved story corpora, structured knowledge datasets, and evaluation metrics;
(iii) we give multidimensional insights into the challenges of
knowledge-enhanced story generation and cast light on promising directions for
future study
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization
Narrative summarization aims to produce a distilled version of a narrative to
describe its most salient events and characters. Summarizing a narrative is
challenging as it requires an understanding of event causality and character
behaviors. To encourage research in this direction, we propose NarraSum, a
large-scale narrative summarization dataset. It contains 122K narrative
documents, which are collected from plot descriptions of movies and TV episodes
with diverse genres, and their corresponding abstractive summaries. Experiments
show that there is a large performance gap between humans and the
state-of-the-art summarization models on NarraSum. We hope that this dataset
will promote future research in summarization, as well as broader studies of
natural language understanding and generation. The dataset is available at
https://github.com/zhaochaocs/narrasum.Comment: EMNLP Findings 202
- …