139 research outputs found
A multimodal mixture-of-experts model for dynamic emotion prediction in movies
This paper addresses the problem of continuous emotion prediction in movies from multimodal cues. The rich emotion content in movies is inherently multimodal, where emotion is evoked through both audio (music, speech) and video modalities. To capture such affective information, we put forth a set of audio and video features that includes several novel features such as, Video Compressibility and Histogram of Facial Area (HFA). We propose a Mixture of Experts (MoE)-based fusion model that dynamically combines information from the audio and video modalities for predicting the emotion evoked in movies. A learning module, based on hard Expectation-Maximization (EM) algorithm, is presented for the MoE model. Experiments on a database of popular movies demonstrate that our MoE-based fusion method outperforms popular fusion strategies (e.g. early and late fusion) in the context of dynamic emotion prediction
Unsupervised discovery of character dictionaries in animation movies
Automatic content analysis of animation movies can enable an objective understanding of character (actor) representations and their portrayals. It can also help illuminate potential markers of unconscious biases and their impact. However, multimedia analysis of movie content has predominantly focused on live-action features. A dearth of multimedia research in this field is because of the complexity and heterogeneity in the design of animated characters-an extremely challenging problem to be generalized by a single method or model. In this paper, we address the problem of automatically discovering characters in animation movies as a first step toward automatic character labeling in these media. Movie-specific character dictionaries can act as a powerful first step for subsequent content analysis at scale. We propose an unsupervised approach which requires no prior information about the characters in a movie. We first use a deep neural network-based object detector that is trained on natural images to identify a set of initial character candidates. These candidates are further pruned using saliency constraints and visual object tracking. A character dictionary per movie is then generated from exemplars obtained by clustering these candidates. We are able to identify both anthropomorphic and nonanthropomorphic characters in a dataset of 46 animation movies with varying composition and character design. Our results indicate high precision and recall of the automatically detected characters compared to human-annotated ground truth, demonstrating the generalizability of our approach
A computational study of expressive facial dynamics in children with autism
Several studies have established that facial expressions of children with autism are often perceived as atypical, awkward or less engaging by typical adult observers. Despite this clear deficit in the quality of facial expression production, very little is understood about its underlying mechanisms and characteristics. This paper takes a computational approach to studying details of facial expressions of children with high functioning autism (HFA). The objective is to uncover those characteristics of facial expressions, notably distinct from those in typically developing children, and which are otherwise difficult to detect by visual inspection. We use motion capture data obtained from subjects with HFA and typically developing subjects while they produced various facial expressions. This data is analyzed to investigate how the overall and local facial dynamics of children with HFA differ from their typically developing peers. Our major observations include reduced complexity in the dynamic facial behavior of the HFA group arising primarily from the eye region
Evaluating Atypical Gaze Patterns through Vision Models: The Case of Cortical Visual Impairment
A wide range of neurological and cognitive disorders exhibit distinct
behavioral markers aside from their clinical manifestations. Cortical Visual
Impairment (CVI) is a prime example of such conditions, resulting from damage
to visual pathways in the brain, and adversely impacting low- and high-level
visual function. The characteristics impacted by CVI are primarily described
qualitatively, challenging the establishment of an objective, evidence-based
measure of CVI severity. To study those characteristics, we propose to create
visual saliency maps by adequately prompting deep vision models with attributes
of clinical interest. After extracting saliency maps for a curated set of
stimuli, we evaluate fixation traces on those from children with CVI through
eye tracking technology. Our experiments reveal significant gaze markers that
verify clinical knowledge and yield nuanced discriminability when compared to
those of age-matched control subjects. Using deep learning to unveil atypical
visual saliency is an important step toward establishing an eye-tracking
signature for severe neurodevelopmental disorders, like CVI.Comment: 5 pages, 4 figures, submitted to IEEE EMBC 202
GPT-FL: Generative Pre-trained Model-Assisted Federated Learning
In this work, we propose GPT-FL, a generative pre-trained model-assisted
federated learning (FL) framework. At its core, GPT-FL leverages generative
pre-trained models to generate diversified synthetic data. These generated data
are used to train a downstream model on the server, which is then fine-tuned
with private client data under the standard FL framework. We show that GPT-FL
consistently outperforms state-of-the-art FL methods in terms of model test
accuracy, communication efficiency, and client sampling efficiency. Through
comprehensive ablation analysis, we discover that the downstream model
generated by synthetic data plays a crucial role in controlling the direction
of gradient diversity during FL training, which enhances convergence speed and
contributes to the notable accuracy boost observed with GPT-FL. Also,
regardless of whether the target data falls within or outside the domain of the
pre-trained generative model, GPT-FL consistently achieves significant
performance gains, surpassing the results obtained by models trained solely
with FL or synthetic data
Motion-capture patterns of dynamic facial expressions in children and adolescents with and without ASD
Research shows that neurotypical individuals struggle to interpret the emotional facial expressions of people with Autism Spectrum Disorder (ASD). The current study uses motion-capture to objectively quantify differences between the movement patterns of emotional facial expressions of individuals with and without ASD. Participants volitionally mimicked emotional expressions while wearing facial markers. Recorded marker movement was grouped by expression valence and intensity. We used Growth Curve Analysis to test whether movement patterns were predictable by expression type and participant group. Results show significant interactions between expression type and group, and little effect of emotion valence on ASD expressions. Together, results support perceptions that expressions of individuals with ASD are different from -- and more ambiguous than -- those of neurotypical individuals’
- …