83 research outputs found
Fine-grained Video Attractiveness Prediction Using Multimodal Deep Learning on a Large Real-world Dataset
Nowadays, billions of videos are online ready to be viewed and shared. Among
an enormous volume of videos, some popular ones are widely viewed by online
users while the majority attract little attention. Furthermore, within each
video, different segments may attract significantly different numbers of views.
This phenomenon leads to a challenging yet important problem, namely
fine-grained video attractiveness prediction. However, one major obstacle for
such a challenging problem is that no suitable benchmark dataset currently
exists. To this end, we construct the first fine-grained video attractiveness
dataset, which is collected from one of the most popular video websites in the
world. In total, the constructed FVAD consists of 1,019 drama episodes with
780.6 hours covering different categories and a wide variety of video contents.
Apart from the large amount of videos, hundreds of millions of user behaviors
during watching videos are also included, such as "view counts",
"fast-forward", "fast-rewind", and so on, where "view counts" reflects the
video attractiveness while other engagements capture the interactions between
the viewers and videos. First, we demonstrate that video attractiveness and
different engagements present different relationships. Second, FVAD provides us
an opportunity to study the fine-grained video attractiveness prediction
problem. We design different sequential models to perform video attractiveness
prediction by relying solely on video contents. The sequential models exploit
the multimodal relationships between visual and audio components of the video
contents at different levels. Experimental results demonstrate the
effectiveness of our proposed sequential models with different visual and audio
representations, the necessity of incorporating the two modalities, and the
complementary behaviors of the sequential prediction models at different
levels.Comment: Accepted by WWW 2018 The Big Web Trac
Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video regarding our participation to the audio-video based sub-challenge of the Emotion Recognition in the Wild 2017 challenge. Our model combines cues from multiple video modalities, including static facial features, motion patterns related to the evolution of the human expression over time, and audio information. Specifically, it is composed of three sub-networks trained separately: the first and second ones extract static visual features and dynamic patterns through 2D and 3D Convolutional Neural Networks (CNN), while the third one consists in a pretrained audio network which is used to extract useful deep acoustic signals from video. In the audio branch, we also apply Long Short Term Memory (LSTM) networks in order to capture the temporal evolution of the audio features. To identify and exploit possible relationships among different modalities, we propose a fusion network that merges cues from the different modalities in one representation. The proposed architecture outperforms the challenge baselines (38.81% and 40.47%): we achieve an accuracy of 50.39% and 49.92% respectively on the validation and the testing data
Player agency in interactive narrative: audience, actor & author
The question motivating this review paper is, how can
computer-based interactive narrative be used as a constructivist learn-
ing activity? The paper proposes that player agency can be used to
link interactive narrative to learner agency in constructivist theory,
and to classify approaches to interactive narrative. The traditional
question driving research in interactive narrative is, âhow can an in-
teractive narrative deal with a high degree of player agency, while
maintaining a coherent and well-formed narrative?â This question
derives from an Aristotelian approach to interactive narrative that,
as the question shows, is inherently antagonistic to player agency.
Within this approach, player agency must be restricted and manip-
ulated to maintain the narrative. Two alternative approaches based
on Brechtâs Epic Theatre and Boalâs Theatre of the Oppressed are
reviewed. If a Boalian approach to interactive narrative is taken the
conflict between narrative and player agency dissolves. The question
that emerges from this approach is quite different from the traditional
question above, and presents a more useful approach to applying in-
teractive narrative as a constructivist learning activity
- âŚ