66 research outputs found
IMDB Spoiler Dataset
User-generated reviews are often our first point of contact when we consider
watching a movie or a TV show. However, beyond telling us the qualitative
aspects of the media we want to consume, reviews may inevitably contain
undesired revelatory information (i.e. 'spoilers') such as the surprising fate
of a character in a movie, or the identity of a murderer in a crime-suspense
movie, etc. In this paper, we present a high-quality movie-review based spoiler
dataset to tackle the problem of spoiler detection and describe various
research questions it can answer
Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks
Online movie review platforms are providing crowdsourced feedback for the
film industry and the general public, while spoiler reviews greatly compromise
user experience. Although preliminary research efforts were made to
automatically identify spoilers, they merely focus on the review content
itself, while robust spoiler detection requires putting the review into the
context of facts and knowledge regarding movies, user behavior on film review
platforms, and more. In light of these challenges, we first curate a
large-scale network-based spoiler detection dataset LCS and a comprehensive and
up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View
Spoiler Detection framework that takes into account the external knowledge
about movies and user activities on movie review platforms. Specifically, MVSD
constructs three interconnecting heterogeneous information networks to model
diverse data sources and their multi-view attributes, while we design and
employ a novel heterogeneous graph neural network architecture for spoiler
detection as node-level classification. Extensive experiments demonstrate that
MVSD advances the state-of-the-art on two spoiler detection datasets, while the
introduction of external knowledge and user interactions help ground robust
spoiler detection. Our data and code are available at
https://github.com/Arthur-Heng/Spoiler-DetectionComment: EMNLP 202
An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com
Reader reviews of literary fiction on social media, especially those in
persistent, dedicated forums, create and are in turn driven by underlying
narrative frameworks. In their comments about a novel, readers generally
include only a subset of characters and their relationships, thus offering a
limited perspective on that work. Yet in aggregate, these reviews capture an
underlying narrative framework comprised of different actants (people, places,
things), their roles, and interactions that we label the "consensus narrative
framework". We represent this framework in the form of an actant-relationship
story graph. Extracting this graph is a challenging computational problem,
which we pose as a latent graphical model estimation problem. Posts and reviews
are viewed as samples of sub graphs/networks of the hidden narrative framework.
Inspired by the qualitative narrative theory of Greimas, we formulate a
graphical generative Machine Learning (ML) model where nodes represent actants,
and multi-edges and self-loops among nodes capture context-specific
relationships. We develop a pipeline of interlocking automated methods to
extract key actants and their relationships, and apply it to thousands of
reviews and comments posted on Goodreads.com. We manually derive the ground
truth narrative framework from SparkNotes, and then use word embedding tools to
compare relationships in ground truth networks with our extracted networks. We
find that our automated methodology generates highly accurate consensus
narrative frameworks: for our four target novels, with approximately 2900
reviews per novel, we report average coverage/recall of important relationships
of > 80% and an average edge detection rate of >89\%. These extracted narrative
frameworks can generate insight into how people (or classes of people) read and
how they recount what they have read to others
The Tag Genome Dataset for Books
Attaching tags to items, such as books or movies, is found in many online systems. While a majority of these systems use binary tags, continuous item-tag relevance scores, such as those in tag genome, offer richer descriptions of item content. For example, tag genome for movies assigns the tag "gangster" to the movie "The Godfather (1972)" with a score of 0.93 on a scale of 0 to 1. Tag genome has received considerable attention in recommender systems research and has been used in a wide variety of studies, from investigating the effects of recommender systems on users to generating ideas for movies that appeal to certain user groups.In this paper, we present tag genome for books, a dataset containing book-tag relevance scores, where a significant number of tags overlap with those from tag genome for movies. To generate our dataset, we designed a survey based on popular books and tags from the Goodreads dataset. In our survey, we asked users to provide ratings for how well tags applied to books. We generated book-tag relevance scores based on user ratings along with features from the Goodreads dataset. In addition to being used to create book recommender systems, tag genome for books can be combined with the tag genome for movies to tackle cross-domain problems, such as recommending books based on movie preferences.Peer reviewe
- …