4,615 research outputs found
Latent Semantic Learning with Structured Sparse Representation for Human Action Recognition
This paper proposes a novel latent semantic learning method for extracting
high-level features (i.e. latent semantics) from a large vocabulary of abundant
mid-level features (i.e. visual keywords) with structured sparse
representation, which can help to bridge the semantic gap in the challenging
task of human action recognition. To discover the manifold structure of
midlevel features, we develop a spectral embedding approach to latent semantic
learning based on L1-graph, without the need to tune any parameter for graph
construction as a key step of manifold learning. More importantly, we construct
the L1-graph with structured sparse representation, which can be obtained by
structured sparse coding with its structured sparsity ensured by novel L1-norm
hypergraph regularization over mid-level features. In the new embedding space,
we learn latent semantics automatically from abundant mid-level features
through spectral clustering. The learnt latent semantics can be readily used
for human action recognition with SVM by defining a histogram intersection
kernel. Different from the traditional latent semantic analysis based on topic
models, our latent semantic learning method can explore the manifold structure
of mid-level features in both L1-graph construction and spectral embedding,
which results in compact but discriminative high-level features. The
experimental results on the commonly used KTH action dataset and unconstrained
YouTube action dataset show the superior performance of our method.Comment: The short version of this paper appears in ICCV 201
Semi-Supervised Sparse Coding
Sparse coding approximates the data sample as a sparse linear combination of
some basic codewords and uses the sparse codes as new presentations. In this
paper, we investigate learning discriminative sparse codes by sparse coding in
a semi-supervised manner, where only a few training samples are labeled. By
using the manifold structure spanned by the data set of both labeled and
unlabeled samples and the constraints provided by the labels of the labeled
samples, we learn the variable class labels for all the samples. Furthermore,
to improve the discriminative ability of the learned sparse codes, we assume
that the class labels could be predicted from the sparse codes directly using a
linear classifier. By solving the codebook, sparse codes, class labels and
classifier parameters simultaneously in a unified objective function, we
develop a semi-supervised sparse coding algorithm. Experiments on two
real-world pattern recognition problems demonstrate the advantage of the
proposed methods over supervised sparse coding methods on partially labeled
data sets
Mid-level Deep Pattern Mining
Mid-level visual element discovery aims to find clusters of image patches
that are both representative and discriminative. In this work, we study this
problem from the prospective of pattern mining while relying on the recently
popularized Convolutional Neural Networks (CNNs). Specifically, we find that
for an image patch, activations extracted from the first fully-connected layer
of CNNs have two appealing properties which enable its seamless integration
with pattern mining. Patterns are then discovered from a large number of CNN
activations of image patches through the well-known association rule mining.
When we retrieve and visualize image patches with the same pattern,
surprisingly, they are not only visually similar but also semantically
consistent. We apply our approach to scene and object classification tasks, and
demonstrate that our approach outperforms all previous works on mid-level
visual element discovery by a sizeable margin with far fewer elements being
used. Our approach also outperforms or matches recent works using CNN for these
tasks. Source code of the complete system is available online.Comment: Published in Proc. IEEE Conf. Computer Vision and Pattern Recognition
201
- …