71 research outputs found
Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks
Online movie review platforms are providing crowdsourced feedback for the
film industry and the general public, while spoiler reviews greatly compromise
user experience. Although preliminary research efforts were made to
automatically identify spoilers, they merely focus on the review content
itself, while robust spoiler detection requires putting the review into the
context of facts and knowledge regarding movies, user behavior on film review
platforms, and more. In light of these challenges, we first curate a
large-scale network-based spoiler detection dataset LCS and a comprehensive and
up-to-date movie knowledge base UKM. We then propose MVSD, a novel Multi-View
Spoiler Detection framework that takes into account the external knowledge
about movies and user activities on movie review platforms. Specifically, MVSD
constructs three interconnecting heterogeneous information networks to model
diverse data sources and their multi-view attributes, while we design and
employ a novel heterogeneous graph neural network architecture for spoiler
detection as node-level classification. Extensive experiments demonstrate that
MVSD advances the state-of-the-art on two spoiler detection datasets, while the
introduction of external knowledge and user interactions help ground robust
spoiler detection. Our data and code are available at
https://github.com/Arthur-Heng/Spoiler-DetectionComment: EMNLP 202
Structure-aware narrative summarization from multiple views
Narratives, such as movies and TV shows, provide a testbed for addressing a variety of challenges in the field of artificial intelligence. They are examples of complex stories where characters and events interact in many ways. Inferring what is happening in a narrative requires modeling long-range dependencies between events, understanding commonsense knowledge and accounting for non-linearities in the presentation of the story. Moreover, narratives are usually long (i.e., there are hundreds of pages in a screenplay and thousands of frames in a video) and cannot be easily processed by standard neural architectures. Movies and TV episodes also include information from multiple sources (i.e., video, audio, text) that are complementary to inferring high-level events and their interactions. Finally, creating large-scale multimodal datasets with narratives containing long videos and aligned textual data is challenging, resulting in small datasets that require data efficient approaches.
Most prior work that analyzes narratives does not consider the above challenges all at once. In most cases, text-only approaches focus on full-length narratives with complex semantics and address tasks such as question-answering and summarization, or multimodal approaches are limited to short videos with simpler semantics (e.g., isolated actions and local interactions). In this thesis, we combine these two different directions in addressing narrative summarization. We use all input modalities (i.e., video, audio, text), consider full-length narratives and perform the task of narrative summarization both in a video-to-video setting (i.e., video summarization, trailer generation) and a video-to-text setting (i.e., multimodal abstractive summarization).
We hypothesize that information about the narrative structure of movies and TVepisodes can facilitate summarizing them. We introduce the task of Turning Point identification and provide a corresponding dataset called TRIPOD as a means of analyzing the narrative structure of movies. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie or TV episode: they define the plot structure and determine its progression and thematic units. We validate that narrative structure contributes to extractive screenplay summarization by testing our hypothesis on a dataset containing TV episodes and summary-specific labels.
We further hypothesize that movies should not be viewed as a sequence of scenes from a screenplay or shots from a video and instead be modelled as sparse graphs, where nodes are scenes or shots and edges denote strong semantic relationships between them. We utilize multimodal information for creating movie graphs in the latent space, and find that both graph-related and multimodal information help contextualization and boost performance on extractive summarization.
Moving one step further, we also address the task of trailer moment identification, which can be viewed as a specific instiatiation of narrative summarization. We decompose this task, which is challenging and subjective, into two simpler ones: narrativestructure identification, defined again by turning points, and sentiment prediction. We propose a graph-based unsupervised algorithm that uses interpretable criteria for retrieving trailer shots and convert it into an interactive tool with a human in the loop for trailer creation. Semi-automatic trailer shot selection exhibits comparable performance to fully manual selection according to human judges, while minimizing processing time.
After identifying salient content in narratives, we next attempt to produce abstractive textual summaries (i.e., video-to-text). We hypothesize that multimodal information is directly important for generating textual summaries, apart from contributing to content selection. For that, we propose a parameter efficient way for incorporating multimodal information into a pre-trained textual summarizer, while training only 3.8% of model parameters, and demonstrate the importance of multimodal information for generating high-quality and factual summaries. The findings of this thesis underline the need to focus on realistic and multimodal settings when addressing narrative analysis and generation tasks
A disentangled adversarial neural topic model for separating opinions from plots in user reviews
The flexibility of the inference process in Variational Autoencoders (VAEs) has recently led to revising traditional probabilistic topic models giving rise to Neural Topic Models (NTM). Although these approaches have achieved significant results, surprisingly very little work has been done on how to disentangle the latent topics. Existing topic models when applied to reviews may extract topics associated with writersâ subjective opinions mixed with those related to factual descriptions such as plot summaries in movie and book reviews. It is thus desirable to automatically separate opinion topics from plot/neutral ones enabling a better interpretability. In this paper, we propose a neural topic model combined with adversarial training to disentangle opinion topics from plot and neutral ones. We conduct an extensive experimental assessment introducing a new collection of movie and book reviews paired with their plots, namely MOBO dataset, showing an improved coherence and variety of topics, a consistent disentanglement rate, and sentiment classification performance superior to other supervised topic models
Automatic movie analysis and summarisation
Automatic movie analysis is the task of employing Machine Learning methods to the
field of screenplays, movie scripts, and motion pictures to facilitate or enable various
tasks throughout the entirety of a movieâs life-cycle. From helping with making
informed decisions about a new movie script with respect to aspects such as its originality,
similarity to other movies, or even commercial viability, all the way to offering
consumers new and interesting ways of viewing the final movie, many stages in the
life-cycle of a movie stand to benefit from Machine Learning techniques that promise
to reduce human effort, time, or both. Within this field of automatic movie analysis,
this thesis addresses the task of summarising the content of screenplays, enabling users
at any stage to gain a broad understanding of a movie from greatly reduced data. The
contributions of this thesis are four-fold: (i)We introduce ScriptBase, a new large-scale
data set of original movie scripts, annotated with additional meta-information such as
genre and plot tags, cast information, and log- and tag-lines. To our knowledge, Script-
Base is the largest data set of its kind, containing scripts and information for almost
1,000 Hollywood movies. (ii) We present a dynamic summarisation model for the
screenplay domain, which allows for extraction of highly informative and important
scenes from movie scripts. The extracted summaries allow for the content of the original
script to stay largely intact and provide the user with its important parts, while
greatly reducing the script-reading time. (iii) We extend our summarisation model
to capture additional modalities beyond the screenplay text. The model is rendered
multi-modal by introducing visual information obtained from the actual movie and by
extracting scenes from the movie, allowing users to generate visual summaries of motion
pictures. (iv) We devise a novel end-to-end neural network model for generating
natural language screenplay overviews. This model enables the user to generate short
descriptive and informative texts that capture certain aspects of a movie script, such as
its genres, approximate content, or style, allowing them to gain a fast, high-level understanding
of the screenplay. Multiple automatic and human evaluations were carried
out to assess the performance of our models, demonstrating that they are well-suited
for the tasks set out in this thesis, outperforming strong baselines. Furthermore, the
ScriptBase data set has started to gain traction, and is currently used by a number of
other researchers in the field to tackle various tasks relating to screenplays and their
analysis
Probabilistic neural topic models for text understanding
Making sense of text is still one of the most fascinating and open challenges thanks and despite the vast amount of information continuously produced by recent technologies. Along with the growing size of textual data, automatic approaches have to deal with the wide variety of linguistic features across different domains and contexts: for example, user reviews might be characterised by colloquial idioms, slang or contractions; while clinical notes often contain technical jargon, with typical medical abbreviations and polysemous words whose meaning strictly depend on the particular context in which they were used.
We propose to address these issues by combining topic modelling principles and models with distributional word representations. Topic models generate concise and expressive representations for high volumes of documents by clustering words into âtopicsâ, which can be interpreted as document decompositions. They are focused on analysing the global context of words and their co-occurrences within the whole corpus. Distributional language representations, instead, encode the word syntactic and semantic properties by leveraging the word local contexts and can be conveniently pre-trained to facilitate the model training and the simultaneous encoding of external knowledge. Our work represents one step in bridging the gap between the recent advances in topic modelling and the increasingly richer distributional word representations, with the aim of addressing the aforementioned issues related to different linguistic features within different domains.
In this thesis, we first propose a hierarchical neural model inspired by topic modelling, which leverages an attention mechanism along with a novel neural cell for fine-grained detection of sentiments and themes discussed in user reviews. Next, we present a neural topic model with adversarial training to distinguish topics based on their high-level semantics (e.g. opinions or factual descriptions). Then, we design a probabilistic topic model specialised for the extraction of biomedical phrases, whose inference process goes beyond the limitations of traditional topic models by seamlessly combining the word co-occurrences statistics with the information from word embeddings. Finally, inspired by the usage of entities in topic modelling [85], we design a novel masking strategy to fine-tune language models for biomedical question-answering. For each of the above models, we report experimental assessments supporting their efficacy across a wide variety of tasks and domains
Recommended from our members
Modeling the Dynamics of Consumer Behavior from Massive Interaction Data
Recent technological innovations (e.g. e-commerce platforms, automated retail stores) have enabled dramatic changes in people's shopping experiences, as well as the accessibility to incredible volumes of consumer-product interaction data. As a result, machine learning (ML) systems can be widely developed to help people navigate relevant information and make decisions. Traditional ML systems have achieved great success on various well-defined problems such as speech recognition and facial recognition. Unlike these tasks where datasets and objectives are clearly benchmarked, modeling consumer behavior can be rather complicated; for example, consumer activities can be affected by real-time shopping contexts, collected interaction data can be noisy and biased, interests from multiple parties (both consumers and producers) can be involved in the predictive objectives.The primary goal of this dissertation is to address the obstacles in modeling consumer activities through computational approaches, but with careful considerations from economic and societal perspectives. Intellectually, such models help us to understand the forces that guide consumer behavior. Methodologically, I build algorithms capable of processing massive interaction datasets by connecting well-developed ML techniques and well-established economic theories. Practically, my work has applications ranging from recommender systems, e-commerce and business intelligence
- âŠ