8,475 research outputs found

    Histogram of Oriented Principal Components for Cross-View Action Recognition

    Full text link
    Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods

    Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning

    Full text link
    Convolutional neural networks (CNNs) have emerged as the state-of-the-art in multiple vision tasks including depth estimation. However, memory and computing power requirements remain as challenges to be tackled in these models. Monocular depth estimation has significant use in robotics and virtual reality that requires deployment on low-end devices. Training a small model from scratch results in a significant drop in accuracy and it does not benefit from pre-trained large models. Motivated by the literature of model pruning, we propose a lightweight monocular depth model obtained from a large trained model. This is achieved by removing the least important features with a novel joint end-to-end filter pruning. We propose to learn a binary mask for each filter to decide whether to drop the filter or not. These masks are trained jointly to exploit relations between filters at different layers as well as redundancy within the same layer. We show that we can achieve around 5x compression rate with small drop in accuracy on the KITTI driving dataset. We also show that masking can improve accuracy over the baseline with fewer parameters, even without enforcing compression loss

    Learning Discriminative Features with Class Encoder

    Full text link
    Deep neural networks usually benefit from unsupervised pre-training, e.g. auto-encoders. However, the classifier further needs supervised fine-tuning methods for good discrimination. Besides, due to the limits of full-connection, the application of auto-encoders is usually limited to small, well aligned images. In this paper, we incorporate the supervised information to propose a novel formulation, namely class-encoder, whose training objective is to reconstruct a sample from another one of which the labels are identical. Class-encoder aims to minimize the intra-class variations in the feature space, and to learn a good discriminative manifolds on a class scale. We impose the class-encoder as a constraint into the softmax for better supervised training, and extend the reconstruction on feature-level to tackle the parameter size issue and translation issue. The experiments show that the class-encoder helps to improve the performance on benchmarks of classification and face recognition. This could also be a promising direction for fast training of face recognition models.Comment: Accepted by CVPR2016 Workshop of Robust Features for Computer Visio

    Facial Expression Recognition

    Get PDF

    Graph edit distance from spectral seriation

    Get PDF
    This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems
    • …
    corecore