4,437 research outputs found
Joint Projection and Dictionary Learning using Low-rank Regularization and Graph Constraints
In this paper, we aim at learning simultaneously a discriminative dictionary
and a robust projection matrix from noisy data. The joint learning, makes the
learned projection and dictionary a better fit for each other, so a more
accurate classification can be obtained. However, current prevailing joint
dimensionality reduction and dictionary learning methods, would fail when the
training samples are noisy or heavily corrupted. To address this issue, we
propose a joint projection and dictionary learning using low-rank
regularization and graph constraints (JPDL-LR). Specifically, the
discrimination of the dictionary is achieved by imposing Fisher criterion on
the coding coefficients. In addition, our method explicitly encodes the local
structure of data by incorporating a graph regularization term, that further
improves the discriminative ability of the projection matrix. Inspired by
recent advances of low-rank representation for removing outliers and noise, we
enforce a low-rank constraint on sub-dictionaries of all classes to make them
more compact and robust to noise. Experimental results on several benchmark
datasets verify the effectiveness and robustness of our method for both
dimensionality reduction and image classification, especially when the data
contains considerable noise or variations
Multi-View Task-Driven Recognition in Visual Sensor Networks
Nowadays, distributed smart cameras are deployed for a wide set of tasks in
several application scenarios, ranging from object recognition, image
retrieval, and forensic applications. Due to limited bandwidth in distributed
systems, efficient coding of local visual features has in fact been an active
topic of research. In this paper, we propose a novel approach to obtain a
compact representation of high-dimensional visual data using sensor fusion
techniques. We convert the problem of visual analysis in resource-limited
scenarios to a multi-view representation learning, and we show that the key to
finding properly compressed representation is to exploit the position of
cameras with respect to each other as a norm-based regularization in the
particular signal representation of sparse coding. Learning the representation
of each camera is viewed as an individual task and a multi-task learning with
joint sparsity for all nodes is employed. The proposed representation learning
scheme is referred to as the multi-view task-driven learning for visual sensor
network (MT-VSN). We demonstrate that MT-VSN outperforms state-of-the-art in
various surveillance recognition tasks.Comment: 5 pages, Accepted in International Conference of Image Processing,
201
Jointly Learning Structured Analysis Discriminative Dictionary and Analysis Multiclass Classifier
In this paper, we propose an analysis mechanism based structured Analysis
Discriminative Dictionary Learning (ADDL) framework. ADDL seamlessly integrates
the analysis discriminative dictionary learning, analysis representation and
analysis classifier training into a unified model. The applied analysis
mechanism can make sure that the learnt dictionaries, representations and
linear classifiers over different classes are independent and discriminating as
much as possible. The dictionary is obtained by minimizing a reconstruction
error and an analytical incoherence promoting term that encourages the
sub-dictionaries associated with different classes to be independent. To obtain
the representation coefficients, ADDL imposes a sparse l2,1-norm constraint on
the coding coefficients instead of using l0 or l1-norm, since the l0 or l1-norm
constraint applied in most existing DL criteria makes the training phase time
consuming. The codes-extraction projection that bridges data with the sparse
codes by extracting special features from the given samples is calculated via
minimizing a sparse codes approximation term. Then we compute a linear
classifier based on the approximated sparse codes by an analysis mechanism to
simultaneously consider the classification and representation powers. Thus, the
classification approach of our model is very efficient, because it can avoid
the extra time-consuming sparse reconstruction process with trained dictionary
for each new test data as most existing DL algorithms. Simulations on real
image databases demonstrate that our ADDL model can obtain superior performance
over other state-of-the-arts.Comment: Accepted by IEEE TNNL
Structured Occlusion Coding for Robust Face Recognition
Occlusion in face recognition is a common yet challenging problem. While
sparse representation based classification (SRC) has been shown promising
performance in laboratory conditions (i.e. noiseless or random pixel
corrupted), it performs much worse in practical scenarios. In this paper, we
consider the practical face recognition problem, where the occlusions are
predictable and available for sampling. We propose the structured occlusion
coding (SOC) to address occlusion problems. The structured coding here lies in
two folds. On one hand, we employ a structured dictionary for recognition. On
the other hand, we propose to use the structured sparsity in this formulation.
Specifically, SOC simultaneously separates the occlusion and classifies the
image. In this way, the problem of recognizing an occluded image is turned into
seeking a structured sparse solution on occlusion-appended dictionary. In order
to construct a well-performing occlusion dictionary, we propose an occlusion
mask estimating technique via locality constrained dictionary (LCD), showing
striking improvement in occlusion sample. On a category-specific occlusion
dictionary, we replace norm sparsity with the structured sparsity which is
shown more robust, further enhancing the robustness of our approach. Moreover,
SOC achieves significant improvement in handling large occlusion in real world.
Extensive experiments are conducted on public data sets to validate the
superiority of the proposed algorithm
Deep Boosting: Joint Feature Selection and Analysis Dictionary Learning in Hierarchy
This work investigates how the traditional image classification pipelines can
be extended into a deep architecture, inspired by recent successes of deep
neural networks. We propose a deep boosting framework based on layer-by-layer
joint feature boosting and dictionary learning. In each layer, we construct a
dictionary of filters by combining the filters from the lower layer, and
iteratively optimize the image representation with a joint
discriminative-generative formulation, i.e. minimization of empirical
classification error plus regularization of analysis image generation over
training images. For optimization, we perform two iterating steps: i) to
minimize the classification error, select the most discriminative features
using the gentle adaboost algorithm; ii) according to the feature selection,
update the filters to minimize the regularization on analysis image
representation using the gradient descent method. Once the optimization is
converged, we learn the higher layer representation in the same way. Our model
delivers several distinct advantages. First, our layer-wise optimization
provides the potential to build very deep architectures. Second, the generated
image representation is compact and meaningful. In several visual recognition
tasks, our framework outperforms existing state-of-the-art approaches
Cross-label Suppression: A Discriminative and Fast Dictionary Learning with Group Regularization
This paper addresses image classification through learning a compact and
discriminative dictionary efficiently. Given a structured dictionary with each
atom (columns in the dictionary matrix) related to some label, we propose
cross-label suppression constraint to enlarge the difference among
representations for different classes. Meanwhile, we introduce group
regularization to enforce representations to preserve label properties of
original samples, meaning the representations for the same class are encouraged
to be similar. Upon the cross-label suppression, we don't resort to
frequently-used -norm or -norm for coding, and obtain
computational efficiency without losing the discriminative power for
categorization. Moreover, two simple classification schemes are also developed
to take full advantage of the learnt dictionary. Extensive experiments on six
data sets including face recognition, object categorization, scene
classification, texture recognition and sport action categorization are
conducted, and the results show that the proposed approach can outperform lots
of recently presented dictionary algorithms on both recognition accuracy and
computational efficiency.Comment: 36 pages, 12 figures, 11 table
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
Sparse Dictionary-based Attributes for Action Recognition and Summarization
We present an approach for dictionary learning of action attributes via
information maximization. We unify the class distribution and appearance
information into an objective function for learning a sparse dictionary of
action attributes. The objective function maximizes the mutual information
between what has been learned and what remains to be learned in terms of
appearance information and class distribution for each dictionary atom. We
propose a Gaussian Process (GP) model for sparse representation to optimize the
dictionary objective function. The sparse coding property allows a kernel with
compact support in GP to realize a very efficient dictionary learning process.
Hence we can describe an action video by a set of compact and discriminative
action attributes. More importantly, we can recognize modeled action categories
in a sparse feature space, which can be generalized to unseen and unmodeled
action categories. Experimental results demonstrate the effectiveness of our
approach in action recognition and summarization
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
Kernel Coding: General Formulation and Special Cases
Representing images by compact codes has proven beneficial for many visual
recognition tasks. Most existing techniques, however, perform this coding step
directly in image feature space, where the distributions of the different
classes are typically entangled. In contrast, here, we study the problem of
performing coding in a high-dimensional Hilbert space, where the classes are
expected to be more easily separable. To this end, we introduce a general
coding formulation that englobes the most popular techniques, such as bag of
words, sparse coding and locality-based coding, and show how this formulation
and its special cases can be kernelized. Importantly, we address several
aspects of learning in our general formulation, such as kernel learning,
dictionary learning and supervised kernel coding. Our experimental evaluation
on several visual recognition tasks demonstrates the benefits of performing
coding in Hilbert space, and in particular of jointly learning the kernel, the
dictionary and the classifier
- …