5 research outputs found
MediaEval 2018: Predicting Media Memorability Task
In this paper, we present the Predicting Media Memorability task, which is
proposed as part of the MediaEval 2018 Benchmarking Initiative for Multimedia
Evaluation. Participants are expected to design systems that automatically
predict memorability scores for videos, which reflect the probability of a
video being remembered. In contrast to previous work in image memorability
prediction, where memorability was measured a few minutes after memorization,
the proposed dataset comes with short-term and long-term memorability
annotations. All task characteristics are described, namely: the task's
challenges and breakthrough, the released data set and ground truth, the
required participant runs and the evaluation metrics
Multimodal Deep Features Fusion For Video Memorability Prediction
This paper describes a multimodal feature fusion approach for predicting the short and long term video memorability where the goal to design a system that automatically predicts scores reflecting the probability of a video being remembered. The approach performs early fusion of text, image, and video features. Text features are extracted using a Convolutional Neural Network (CNN), an FBResNet152 pre-trained on ImageNet is used to extract image
features and and video features are extracted using 3DResNet152 pre-trained on Kinetics 400.We use Fisher Vectors to obtain a single vector associated with each video that overcomes the need for using a non-fixed global vector representation for handling temporal information. The fusion approach demonstrates good predictive performance and regression superiority in terms of correlation over standard features
Collecting, Analyzing and Predicting Socially-Driven Image Interestingness
International audienceInterestingness has recently become an emerging concept for visual content assessment. However, understanding and predicting image interestingness remains challenging as its judgment is highly subjective and usually context-dependent. In addition, existing datasets are quite small for in-depth analysis. To push forward research in this topic, a large-scale interestingness dataset (images and their associated metadata) is described in this paper and released for public use. We then propose computational models based on deep learning to predict image interestingness. We show that exploiting relevant contextual information derived from social metadata could greatly improve the prediction results. Finally we discuss some key findings and potential research directions for this emerging topic
More than meets the eye: the conceptual essence of intrinsic memorability
In a world where sensory threads weave an endless tapestry of multi-modal data, the human brain stands as the masterful weaver of meaning. As we wade through this tempest of input, our brain spins these threads into an intelligible internal representation and holds on tight to what it deems important. But what, exactly, makes certain threads more important than others? And how can we predict their significance?
Memorability is the tensile strength of the threads that tie us to the world. It is a proxy for human importance, indicating which threads the human brain will curate and retain with exceptional fidelity. This research investigates these multisensory threads by exploring the influence of audio, visual, and textual modalities on predicting video memorability, and how the interplay between them can influence the overall memorability of a given piece of content. The findings suggest that, while visual data may dominate our sensory experience, it is the underlying conceptual essence that truly holds the key to memorability. This thesis leverages state-of-the-art image synthesis techniques to distill and examine this essence, creating surrogate dreams of video scenes to facilitate the disentanglement of conceptual and perceptual elements of memorability. The work also leverages human EEG data to explore the possibility of a moment of memorability—a moment of encoding that corresponds to a remembering moment—which we expect to exist due to the temporal nature of the world and the natural encoding limits of our brains. The previously murky relationship between the two core means of remembrance---recognition and recall---are reconciled by conducting a novel video memorability drawing task.
The research sheds new light on the nature of multi-modal memorability, providing a deeper understanding of how our brain processes and retains information in a complex sensory world. By uncovering the conceptual essence that lies at the heart of memorability, it opens up new avenues for predicting and curating more meaningful media content, and ultimately deepen our connection to the world around us