960 research outputs found
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Island Loss for Learning Discriminative Features in Facial Expression Recognition
Over the past few years, Convolutional Neural Networks (CNNs) have shown
promise on facial expression recognition. However, the performance degrades
dramatically under real-world settings due to variations introduced by subtle
facial appearance changes, head pose variations, illumination changes, and
occlusions.
In this paper, a novel island loss is proposed to enhance the discriminative
power of the deeply learned features. Specifically, the IL is designed to
reduce the intra-class variations while enlarging the inter-class differences
simultaneously. Experimental results on four benchmark expression databases
have demonstrated that the CNN with the proposed island loss (IL-CNN)
outperforms the baseline CNN models with either traditional softmax loss or the
center loss and achieves comparable or better performance compared with the
state-of-the-art methods for facial expression recognition.Comment: 8 pages, 3 figure
Multi-Modality Human Action Recognition
Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model
- …