1,556 research outputs found

    EmoNets: Multimodal deep learning approaches for emotion recognition in video

    Full text link
    The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67% on the 2014 dataset

    A framework for cardio-pulmonary resuscitation (CPR) scene retrieval from medical simulation videos based on object and activity detection.

    Get PDF
    In this thesis, we propose a framework to detect and retrieve CPR activity scenes from medical simulation videos. Medical simulation is a modern training method for medical students, where an emergency patient condition is simulated on human-like mannequins and the students act upon. These simulation sessions are recorded by the physician, for later debriefing. With the increasing number of simulation videos, automatic detection and retrieval of specific scenes became necessary. The proposed framework for CPR scene retrieval, would eliminate the conventional approach of using shot detection and frame segmentation techniques. Firstly, our work explores the application of Histogram of Oriented Gradients in three dimensions (HOG3D) to retrieve the scenes containing CPR activity. Secondly, we investigate the use of Local Binary Patterns in Three Orthogonal Planes (LBPTOP), which is the three dimensional extension of the popular Local Binary Patterns. This technique is a robust feature that can detect specific activities from scenes containing multiple actors and activities. Thirdly, we propose an improvement to the above mentioned methods by a combination of HOG3D and LBP-TOP. We use decision level fusion techniques to combine the features. We prove experimentally that the proposed techniques and their combination out-perform the existing system for CPR scene retrieval. Finally, we devise a method to detect and retrieve the scenes containing the breathing bag activity, from the medical simulation videos. The proposed framework is tested and validated using eight medical simulation videos and the results are presented

    Graph Deep Learning: State of the Art and Challenges

    Get PDF
    The last half-decade has seen a surge in deep learning research on irregular domains and efforts to extend convolutional neural networks (CNNs) to work on irregularly structured data. The graph has emerged as a particularly useful geometrical object in deep learning, able to represent a variety of irregular domains well. Graphs can represent various complex systems, from molecular structure, to computer and social and traffic networks. Consequent on the extension of CNNs to graphs, a great amount of research has been published that improves the inferential power and computational efficiency of graph- based convolutional neural networks (GCNNs).The research is incipient, however, and our understanding is relatively rudimentary. The majority of GCNNs are designed to operate with certain properties. In this survey we review of the state of graph representation learning from the perspective of deep learning. We consider challenges in graph deep learning that have been neglected in the majority of work, largely because of the numerous theoretical difficulties they present. We identify four major challenges in graph deep learning: dynamic and evolving graphs, learning with edge signals and information, graph estimation, and the generalization of graph models. For each problem we discuss the theoretical and practical issues, survey the relevant research, while highlighting the limitations of the state of the art. Advances on these challenges would permit GCNNs to be applied to wider range of domains, in situations where graph models have previously been limited owing to the obstructions to applying a model owing to the domains’ natures
    • …
    corecore