5,183 research outputs found
Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking
Public speaking is an important aspect of human communication and
interaction. The majority of computational work on public speaking concentrates
on analyzing the spoken content, and the verbal behavior of the speakers. While
the success of public speaking largely depends on the content of the talk, and
the verbal behavior, non-verbal (visual) cues, such as gestures and physical
appearance also play a significant role. This paper investigates the importance
of visual cues by estimating their contribution towards predicting the
popularity of a public lecture. For this purpose, we constructed a large
database of more than TED talk videos. As a measure of popularity of the
TED talks, we leverage the corresponding (online) viewers' ratings from
YouTube. Visual cues related to facial and physical appearance, facial
expressions, and pose variations are extracted from the video frames using
convolutional neural network (CNN) models. Thereafter, an attention-based long
short-term memory (LSTM) network is proposed to predict the video popularity
from the sequence of visual features. The proposed network achieves
state-of-the-art prediction accuracy indicating that visual cues alone contain
highly predictive information about the popularity of a talk. Furthermore, our
network learns a human-like attention mechanism, which is particularly useful
for interpretability, i.e. how attention varies with time, and across different
visual cues by indicating their relative importance
Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks
Prediction of popularity has profound impact for social media, since it
offers opportunities to reveal individual preference and public attention from
evolutionary social systems. Previous research, although achieves promising
results, neglects one distinctive characteristic of social data, i.e.,
sequentiality. For example, the popularity of online content is generated over
time with sequential post streams of social media. To investigate the
sequential prediction of popularity, we propose a novel prediction framework
called Deep Temporal Context Networks (DTCN) by incorporating both temporal
context and temporal attention into account. Our DTCN contains three main
components, from embedding, learning to predicting. With a joint embedding
network, we obtain a unified deep representation of multi-modal user-post data
in a common embedding space. Then, based on the embedded data sequence over
time, temporal context learning attempts to recurrently learn two adaptive
temporal contexts for sequential popularity. Finally, a novel temporal
attention is designed to predict new popularity (the popularity of a new
user-post pair) with temporal coherence across multiple time-scales.
Experiments on our released image dataset with about 600K Flickr photos
demonstrate that DTCN outperforms state-of-the-art deep prediction algorithms,
with an average of 21.51% relative performance improvement in the popularity
prediction (Spearman Ranking Correlation).Comment: accepted in IJCAI-1
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility
Recommendation systems are ubiquitous and impact many domains; they have the
potential to influence product consumption, individuals' perceptions of the
world, and life-altering decisions. These systems are often evaluated or
trained with data from users already exposed to algorithmic recommendations;
this creates a pernicious feedback loop. Using simulations, we demonstrate how
using data confounded in this way homogenizes user behavior without increasing
utility
- …