8,475 research outputs found
Histogram of Oriented Principal Components for Cross-View Action Recognition
Existing techniques for 3D action recognition are sensitive to viewpoint
variations because they extract features from depth images which are viewpoint
dependent. In contrast, we directly process pointclouds for cross-view action
recognition from unknown and unseen views. We propose the Histogram of Oriented
Principal Components (HOPC) descriptor that is robust to noise, viewpoint,
scale and action speed variations. At a 3D point, HOPC is computed by
projecting the three scaled eigenvectors of the pointcloud within its local
spatio-temporal support volume onto the vertices of a regular dodecahedron.
HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D
pointcloud sequences so that view-invariant STK descriptors (or Local HOPC
descriptors) at these key locations only are used for action recognition. We
also propose a global descriptor computed from the normalized spatio-temporal
distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the
performance of our proposed descriptors against nine existing techniques on two
cross-view and three single-view human action recognition datasets. The
Experimental results show that our techniques provide significant improvement
over state-of-the-art methods
Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning
Convolutional neural networks (CNNs) have emerged as the state-of-the-art in
multiple vision tasks including depth estimation. However, memory and computing
power requirements remain as challenges to be tackled in these models.
Monocular depth estimation has significant use in robotics and virtual reality
that requires deployment on low-end devices. Training a small model from
scratch results in a significant drop in accuracy and it does not benefit from
pre-trained large models. Motivated by the literature of model pruning, we
propose a lightweight monocular depth model obtained from a large trained
model. This is achieved by removing the least important features with a novel
joint end-to-end filter pruning. We propose to learn a binary mask for each
filter to decide whether to drop the filter or not. These masks are trained
jointly to exploit relations between filters at different layers as well as
redundancy within the same layer. We show that we can achieve around 5x
compression rate with small drop in accuracy on the KITTI driving dataset. We
also show that masking can improve accuracy over the baseline with fewer
parameters, even without enforcing compression loss
Learning Discriminative Features with Class Encoder
Deep neural networks usually benefit from unsupervised pre-training, e.g.
auto-encoders. However, the classifier further needs supervised fine-tuning
methods for good discrimination. Besides, due to the limits of full-connection,
the application of auto-encoders is usually limited to small, well aligned
images. In this paper, we incorporate the supervised information to propose a
novel formulation, namely class-encoder, whose training objective is to
reconstruct a sample from another one of which the labels are identical.
Class-encoder aims to minimize the intra-class variations in the feature space,
and to learn a good discriminative manifolds on a class scale. We impose the
class-encoder as a constraint into the softmax for better supervised training,
and extend the reconstruction on feature-level to tackle the parameter size
issue and translation issue. The experiments show that the class-encoder helps
to improve the performance on benchmarks of classification and face
recognition. This could also be a promising direction for fast training of face
recognition models.Comment: Accepted by CVPR2016 Workshop of Robust Features for Computer Visio
Graph edit distance from spectral seriation
This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems
- …