28,736 research outputs found

    Building Deep Networks on Grassmann Manifolds

    Full text link
    Learning representations on Grassmann manifolds is popular in quite a few visual recognition tasks. In order to enable deep learning on Grassmann manifolds, this paper proposes a deep network architecture by generalizing the Euclidean network paradigm to Grassmann manifolds. In particular, we design full rank mapping layers to transform input Grassmannian data to more desirable ones, exploit re-orthonormalization layers to normalize the resulting matrices, study projection pooling layers to reduce the model complexity in the Grassmannian context, and devise projection mapping layers to respect Grassmannian geometry and meanwhile achieve Euclidean forms for regular output layers. To train the Grassmann networks, we exploit a stochastic gradient descent setting on manifolds of the connection weights, and study a matrix generalization of backpropagation to update the structured data. The evaluations on three visual recognition tasks show that our Grassmann networks have clear advantages over existing Grassmann learning methods, and achieve results comparable with state-of-the-art approaches.Comment: AAAI'18 pape

    Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update

    Get PDF
    Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm

    Machine Analysis of Facial Expressions

    Get PDF
    No abstract

    Gait recognition based on shape and motion analysis of silhouette contours

    Get PDF
    This paper presents a three-phase gait recognition method that analyses the spatio-temporal shape and dynamic motion (STS-DM) characteristics of a human subject’s silhouettes to identify the subject in the presence of most of the challenging factors that affect existing gait recognition systems. In phase 1, phase-weighted magnitude spectra of the Fourier descriptor of the silhouette contours at ten phases of a gait period are used to analyse the spatio-temporal changes of the subject’s shape. A component-based Fourier descriptor based on anatomical studies of human body is used to achieve robustness against shape variations caused by all common types of small carrying conditions with folded hands, at the subject’s back and in upright position. In phase 2, a full-body shape and motion analysis is performed by fitting ellipses to contour segments of ten phases of a gait period and using a histogram matching with Bhattacharyya distance of parameters of the ellipses as dissimilarity scores. In phase 3, dynamic time warping is used to analyse the angular rotation pattern of the subject’s leading knee with a consideration of arm-swing over a gait period to achieve identification that is invariant to walking speed, limited clothing variations, hair style changes and shadows under feet. The match scores generated in the three phases are fused using weight-based score-level fusion for robust identification in the presence of missing and distorted frames, and occlusion in the scene. Experimental analyses on various publicly available data sets show that STS-DM outperforms several state-of-the-art gait recognition methods

    Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation

    Get PDF
    Recently, mid-level features have shown promising performance in computer vision. Mid-level features learned by incorporating class-level information are potentially more discriminative than traditional low-level local features. In this paper, an effective method is proposed to extract mid-level features from Kinect skeletons for 3D human action recognition. Firstly, the orientations of limbs connected by two skeleton joints are computed and each orientation is encoded into one of the 27 states indicating the spatial relationship of the joints. Secondly, limbs are combined into parts and the limb's states are mapped into part states. Finally, frequent pattern mining is employed to mine the most frequent and relevant (discriminative, representative and non-redundant) states of parts in continuous several frames. These parts are referred to as Frequent Local Parts or FLPs. The FLPs allow us to build powerful bag-of-FLP-based action representation. This new representation yields state-of-the-art results on MSR DailyActivity3D and MSR ActionPairs3D
    • …
    corecore