99 research outputs found

    Improving the accuracy of automatic facial expression recognition in speaking subjects with deep learning

    Get PDF
    When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion recognition accuracy when added in input to a deep neural network model. We develop two neural networks that classify facial expressions in speaking subjects from the RAVDESS dataset, a spatio-temporal CNN and a GRU cell RNN. They are first trained on facial features only, and afterwards both on facial features and articulation related cues extracted from a model trained for lip reading, while varying the number of consecutive frames provided in input as well. We show that using DNNs the addition of features related to articulation increases classification accuracy up to 12%, the increase being greater with more consecutive frames provided in input to the model

    How to look next? A data-driven approach for scanpath prediction

    Get PDF
    By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour

    Problems with Saliency Maps

    Get PDF
    Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures

    Modelling task-dependent eye guidance to objects in pictures

    Get PDF
    We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identi\ufb01es sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is taskdependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments

    Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos

    Get PDF
    In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic

    Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding

    Full text link

    Visual search in ADHD, ASD and ASD + ADHD: overlapping or dissociating disorders?

    Get PDF
    Recent debates in the literature discuss commonalities between Attention-Deficit/Hyperactivity Disorder (ADHD) and Autism Spectrum Disorder (ASD) at multiple levels of putative causal networks. This debate requires systematic comparisons between these disorders that have been studied in isolation in the past, employing potential markers of each disorder to be investigated in tandem. The present study, choose superior local processing, typical to ASD, and increased Intra-Subject Variability (ISV), typical to ADHD, for a head-to-head comparison of the two disorders, while also considering the comorbid cases. It directly examined groups of participants aged 10-13 years with ADHD, ASD with (ASD+) or without (ASD-) comorbid ADHD and a typically developing (TD) group (total N = 85). A visual search task consisting of an array of paired words was designed. The participants needed to find the specific pair of words, where the first word in the pair was the cue word. This visual search task was selected to compare these groups on overall search performance and trial-to-trial variability of search performance (i.e., ISV). Additionally, scanpath analysis was also carried out using Recurrence Quantification Analysis (RQA) and the Multi-Match Model. Results show that only the ASD- group exhibited superior search performance; whereas, only the groups with ADHD symptoms showed increased ISV. These findings point towards a double dissociation between ASD and ADHD, and argue against an overlap between ASD and ADHD
    • …
    corecore