83 research outputs found

    Fine-grained Video Attractiveness Prediction Using Multimodal Deep Learning on a Large Real-world Dataset

    Full text link
    Nowadays, billions of videos are online ready to be viewed and shared. Among an enormous volume of videos, some popular ones are widely viewed by online users while the majority attract little attention. Furthermore, within each video, different segments may attract significantly different numbers of views. This phenomenon leads to a challenging yet important problem, namely fine-grained video attractiveness prediction. However, one major obstacle for such a challenging problem is that no suitable benchmark dataset currently exists. To this end, we construct the first fine-grained video attractiveness dataset, which is collected from one of the most popular video websites in the world. In total, the constructed FVAD consists of 1,019 drama episodes with 780.6 hours covering different categories and a wide variety of video contents. Apart from the large amount of videos, hundreds of millions of user behaviors during watching videos are also included, such as "view counts", "fast-forward", "fast-rewind", and so on, where "view counts" reflects the video attractiveness while other engagements capture the interactions between the viewers and videos. First, we demonstrate that video attractiveness and different engagements present different relationships. Second, FVAD provides us an opportunity to study the fine-grained video attractiveness prediction problem. We design different sequential models to perform video attractiveness prediction by relying solely on video contents. The sequential models exploit the multimodal relationships between visual and audio components of the video contents at different levels. Experimental results demonstrate the effectiveness of our proposed sequential models with different visual and audio representations, the necessity of incorporating the two modalities, and the complementary behaviors of the sequential prediction models at different levels.Comment: Accepted by WWW 2018 The Big Web Trac

    Modeling Multimodal Cues in a Deep Learning-based Framework for Emotion Recognition in the Wild

    Get PDF
    In this paper, we propose a multimodal deep learning architecture for emotion recognition in video regarding our participation to the audio-video based sub-challenge of the Emotion Recognition in the Wild 2017 challenge. Our model combines cues from multiple video modalities, including static facial features, motion patterns related to the evolution of the human expression over time, and audio information. Specifically, it is composed of three sub-networks trained separately: the first and second ones extract static visual features and dynamic patterns through 2D and 3D Convolutional Neural Networks (CNN), while the third one consists in a pretrained audio network which is used to extract useful deep acoustic signals from video. In the audio branch, we also apply Long Short Term Memory (LSTM) networks in order to capture the temporal evolution of the audio features. To identify and exploit possible relationships among different modalities, we propose a fusion network that merges cues from the different modalities in one representation. The proposed architecture outperforms the challenge baselines (38.81% and 40.47%): we achieve an accuracy of 50.39% and 49.92% respectively on the validation and the testing data

    Player agency in interactive narrative: audience, actor & author

    Get PDF
    The question motivating this review paper is, how can computer-based interactive narrative be used as a constructivist learn- ing activity? The paper proposes that player agency can be used to link interactive narrative to learner agency in constructivist theory, and to classify approaches to interactive narrative. The traditional question driving research in interactive narrative is, ‘how can an in- teractive narrative deal with a high degree of player agency, while maintaining a coherent and well-formed narrative?’ This question derives from an Aristotelian approach to interactive narrative that, as the question shows, is inherently antagonistic to player agency. Within this approach, player agency must be restricted and manip- ulated to maintain the narrative. Two alternative approaches based on Brecht’s Epic Theatre and Boal’s Theatre of the Oppressed are reviewed. If a Boalian approach to interactive narrative is taken the conflict between narrative and player agency dissolves. The question that emerges from this approach is quite different from the traditional question above, and presents a more useful approach to applying in- teractive narrative as a constructivist learning activity
    • …
    corecore