625 research outputs found
Latent Semantic Learning with Structured Sparse Representation for Human Action Recognition
This paper proposes a novel latent semantic learning method for extracting
high-level features (i.e. latent semantics) from a large vocabulary of abundant
mid-level features (i.e. visual keywords) with structured sparse
representation, which can help to bridge the semantic gap in the challenging
task of human action recognition. To discover the manifold structure of
midlevel features, we develop a spectral embedding approach to latent semantic
learning based on L1-graph, without the need to tune any parameter for graph
construction as a key step of manifold learning. More importantly, we construct
the L1-graph with structured sparse representation, which can be obtained by
structured sparse coding with its structured sparsity ensured by novel L1-norm
hypergraph regularization over mid-level features. In the new embedding space,
we learn latent semantics automatically from abundant mid-level features
through spectral clustering. The learnt latent semantics can be readily used
for human action recognition with SVM by defining a histogram intersection
kernel. Different from the traditional latent semantic analysis based on topic
models, our latent semantic learning method can explore the manifold structure
of mid-level features in both L1-graph construction and spectral embedding,
which results in compact but discriminative high-level features. The
experimental results on the commonly used KTH action dataset and unconstrained
YouTube action dataset show the superior performance of our method.Comment: The short version of this paper appears in ICCV 201
Non-negative representation based discriminative dictionary learning for face recognition
In this paper, we propose a non-negative representation based discriminative
dictionary learning algorithm (NRDL) for multicategory face classification. In
contrast to traditional dictionary learning methods, NRDL investigates the use
of non-negative representation (NR), which contributes to learning
discriminative dictionary atoms. In order to make the learned dictionary more
suitable for classification, NRDL seamlessly incorporates nonnegative
representation constraint, discriminative dictionary learning and linear
classifier training into a unified model. Specifically, NRDL introduces a
positive constraint on representation matrix to find distinct atoms from
heterogeneous training samples, which results in sparse and discriminative
representation. Moreover, a discriminative dictionary encouraging function is
proposed to enhance the uniqueness of class-specific sub-dictionaries.
Meanwhile, an inter-class incoherence constraint and a compact graph based
regularization term are constructed to respectively improve the
discriminability of learned classifier. Experimental results on several
benchmark face data sets verify the advantages of our NRDL algorithm over the
state-of-the-art dictionary learning methods
Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update
Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm
- …