15,345 research outputs found
Automatic emotional state detection using facial expression dynamic in videos
In this paper, an automatic emotion detection system is built for a computer or machine to detect the emotional state from facial expressions in human computer communication. Firstly, dynamic motion features are extracted from facial expression videos and then advanced machine learning methods for classification and regression are used to predict the emotional states.
The system is evaluated on two publicly available datasets, i.e. GEMEP_FERA and AVEC2013, and satisfied performances are achieved in comparison with the baseline results provided. With this emotional state detection capability, a machine can read the facial expression of its user automatically. This technique can be integrated into applications such as smart robots, interactive games and smart surveillance systems
Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos
When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation
DramaQA: Character-Centered Video Story Understanding with Hierarchical QA
Despite recent progress on computer vision and natural language processing,
developing video understanding intelligence is still hard to achieve due to the
intrinsic difficulty of story in video. Moreover, there is not a theoretical
metric for evaluating the degree of video understanding. In this paper, we
propose a novel video question answering (Video QA) task, DramaQA, for a
comprehensive understanding of the video story. The DramaQA focused on two
perspectives: 1) hierarchical QAs as an evaluation metric based on the
cognitive developmental stages of human intelligence. 2) character-centered
video annotations to model local coherence of the story. Our dataset is built
upon the TV drama "Another Miss Oh" and it contains 16,191 QA pairs from 23,928
various length video clips, with each QA pair belonging to one of four
difficulty levels. We provide 217,308 annotated images with rich
character-centered annotations, including visual bounding boxes, behaviors, and
emotions of main characters, and coreference resolved scripts. Additionally, we
provide analyses of the dataset as well as Dual Matching Multistream model
which effectively learns character-centered representations of video to answer
questions about the video. We are planning to release our dataset and model
publicly for research purposes and expect that our work will provide a new
perspective on video story understanding research.Comment: 21 pages, 10 figures, submitted to ECCV 202
- …