33 research outputs found
Modelling the influence of personality and culture on affect and enjoyment in multimedia
Affect is evoked through an intricate relationship between the characteristics of stimuli, individuals, and systems of perception. While affect is widely researched, few studies consider the combination of multimedia system characteristics and human factors together. As such, this paper explores the influence of personality (Five-Factor Model) and cultural traits (Hofstede Model) on the intensity of multimedia-evoked positive and negative affects (emotions). A set of 144 video sequences (from 12 short movie clips) were evaluated by 114 participants from a cross-cultural population, producing 1232 ratings. On this data, three multilevel regression models are compared: a baseline model that only considers system factors; an extended model that includes personality and culture; and an optimistic model in which each participant is modelled. An analysis shows that personal and cultural traits represent 5.6% of the variance in positive affect and 13.6% of the variance in negative affect. In addition, the affect-enjoyment correlation varied across the clips. This suggests that personality and culture play a key role in predicting the intensity of negative affect and whether or not it is enjoyed, but a more sophisticated set of predictors is needed to model positive affect with the same efficacy
6 Seconds of Sound and Vision: Creativity in Micro-Videos
The notion of creativity, as opposed to related concepts such as beauty or
interestingness, has not been studied from the perspective of automatic
analysis of multimedia content. Meanwhile, short online videos shared on social
media platforms, or micro-videos, have arisen as a new medium for creative
expression. In this paper we study creative micro-videos in an effort to
understand the features that make a video creative, and to address the problem
of automatic detection of creative content. Defining creative videos as those
that are novel and have aesthetic value, we conduct a crowdsourcing experiment
to create a dataset of over 3,800 micro-videos labelled as creative and
non-creative. We propose a set of computational features that we map to the
components of our definition of creativity, and conduct an analysis to
determine which of these features correlate most with creative video. Finally,
we evaluate a supervised approach to automatically detect creative video, with
promising results, showing that it is necessary to model both aesthetic value
and novelty to achieve optimal classification accuracy.Comment: 8 pages, 1 figures, conference IEEE CVPR 201
A multi-objective optimization for video orchestration
In this work, the problem of video orchestration performed by combining information extracted by multiple video sequences is considered. The novelty of the proposed approach relies on the use of aesthetic features and of cinematographic composition rules for automatically aggregating the inputs from different cameras in a unique video. While prior methodologies have separately addressed the issues of aesthetic feature extraction from videos and video orchestration, in this work we exploit a set of features of a scene for automatically selecting the shots being characterized by the best aesthetic score. In order to evaluate the effectiveness of the proposed method, a preliminary subjective experiment has been carried out with experts from the audiovisual field. The achieved results are encouraging and show that there is space for improving the performances
Maximum Margin Learning Under Uncertainty
PhDIn this thesis we study the problem of learning under uncertainty using the statistical
learning paradigm. We rst propose a linear maximum margin classi er that deals
with uncertainty in data input. More speci cally, we reformulate the standard Support
Vector Machine (SVM) framework such that each training example can be modeled
by a multi-dimensional Gaussian distribution described by its mean vector and its
covariance matrix { the latter modeling the uncertainty. We address the classi cation
problem and de ne a cost function that is the expected value of the classical SVM
cost when data samples are drawn from the multi-dimensional Gaussian distributions
that form the set of the training examples. Our formulation approximates the classical
SVM formulation when the training examples are isotropic Gaussians with variance
tending to zero. We arrive at a convex optimization problem, which we solve e -
ciently in the primal form using a stochastic gradient descent approach. The resulting
classi er, which we name SVM with Gaussian Sample Uncertainty (SVM-GSU), is
tested on synthetic data and ve publicly available and popular datasets; namely, the
MNIST, WDBC, DEAP, TV News Channel Commercial Detection, and TRECVID
MED datasets. Experimental results verify the e ectiveness of the proposed method.
Next, we extended the aforementioned linear classi er so as to lead to non-linear decision
boundaries, using the RBF kernel. This extension, where we use isotropic input
uncertainty and we name Kernel SVM with Isotropic Gaussian Sample Uncertainty
(KSVM-iGSU), is used in the problems of video event detection and video aesthetic
quality assessment. The experimental results show that exploiting input uncertainty,
especially in problems where only a limited number of positive training examples are
provided, can lead to better classi cation, detection, or retrieval performance. Finally,
we present a preliminary study on how the above ideas can be used under the deep
convolutional neural networks learning paradigm so as to exploit inherent sources of
uncertainty, such as spatial pooling operations, that are usually used in deep networks
Automatic Emotion Recognition: Quantifying Dynamics and Structure in Human Behavior.
Emotion is a central part of human interaction, one that has a huge influence on its overall tone and outcome. Today's human-centered interactive technology can greatly benefit from automatic emotion recognition, as the extracted affective information can be used to measure, transmit, and respond to user needs. However, developing such systems is challenging due to the complexity of emotional expressions and their dynamics in terms of the inherent multimodality between audio and visual expressions, as well as the mixed factors of modulation that arise when a person speaks. To overcome these challenges, this thesis presents data-driven approaches that can quantify the underlying dynamics in audio-visual affective behavior. The first set of studies lay the foundation and central motivation of this thesis. We discover that it is crucial to model complex non-linear interactions between audio and visual emotion expressions, and that dynamic emotion patterns can be used in emotion recognition. Next, the understanding of the complex characteristics of emotion from the first set of studies leads us to examine multiple sources of modulation in audio-visual affective behavior. Specifically, we focus on how speech modulates facial displays of emotion. We develop a framework that uses speech signals which alter the temporal dynamics of individual facial regions to temporally segment and classify facial displays of emotion. Finally, we present methods to discover regions of emotionally salient events in a given audio-visual data. We demonstrate that different modalities, such as the upper face, lower face, and speech, express emotion with different timings and time scales, varying for each emotion type. We further extend this idea into another aspect of human behavior: human action events in videos. We show how transition patterns between events can be used for automatically segmenting and classifying action events. Our experimental results on audio-visual datasets show that the proposed systems not only improve performance, but also provide descriptions of how affective behaviors change over time. We conclude this dissertation with the future directions that will innovate three main research topics: machine adaptation for personalized technology, human-human interaction assistant systems, and human-centered multimedia content analysis.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133459/1/yelinkim_1.pd