2 research outputs found

    Harnessing the Deep Net Object Models for Enhancing Human Action Recognition

    Full text link
    In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense trajectories may fail to detect them. Hence we propose to detect objects using pre-trained object detectors in every frame statically. Trained Deep network models are used as object detectors. Information from different layers in conjunction with different encoding techniques is extensively studied to obtain the richest feature vectors. This technique is observed to yield state-of-the-art performance on HMDB51 and UCF101 datasets.Comment: 6 pages. arXiv admin note: text overlap with arXiv:1411.4006 by other author

    Manifold Regularized Slow Feature Analysis for Dynamic Texture Recognition

    Full text link
    Dynamic textures exist in various forms, e.g., fire, smoke, and traffic jams, but recognizing dynamic texture is challenging due to the complex temporal variations. In this paper, we present a novel approach stemmed from slow feature analysis (SFA) for dynamic texture recognition. SFA extracts slowly varying features from fast varying signals. Fortunately, SFA is capable to leach invariant representations from dynamic textures. However, complex temporal variations require high-level semantic representations to fully achieve temporal slowness, and thus it is impractical to learn a high-level representation from dynamic textures directly by SFA. In order to learn a robust low-level feature to resolve the complexity of dynamic textures, we propose manifold regularized SFA (MR-SFA) by exploring the neighbor relationship of the initial state of each temporal transition and retaining the locality of their variations. Therefore, the learned features are not only slowly varying, but also partly predictable. MR-SFA for dynamic texture recognition is proposed in the following steps: 1) learning feature extraction functions as convolution filters by MR-SFA, 2) extracting local features by convolution and pooling, and 3) employing Fisher vectors to form a video-level representation for classification. Experimental results on dynamic texture and dynamic scene recognition datasets validate the effectiveness of the proposed approach.Comment: 12 page
    corecore