4,147 research outputs found
A Two-Layer Local Constrained Sparse Coding Method for Fine-Grained Visual Categorization
Fine-grained categories are more difficulty distinguished than generic
categories due to the similarity of inter-class and the diversity of
intra-class. Therefore, the fine-grained visual categorization (FGVC) is
considered as one of challenge problems in computer vision recently. A new
feature learning framework, which is based on a two-layer local constrained
sparse coding architecture, is proposed in this paper. The two-layer
architecture is introduced for learning intermediate-level features, and the
local constrained term is applied to guarantee the local smooth of coding
coefficients. For extracting more discriminative information, local orientation
histograms are the input of sparse coding instead of raw pixels. Moreover, a
quick dictionary updating process is derived to further improve the training
speed. Two experimental results show that our method achieves 85.29% accuracy
on the Oxford 102 flowers dataset and 67.8% accuracy on the CUB-200-2011 bird
dataset, and the performance of our framework is highly competitive with
existing literatures.Comment: 19 pages, 12 figures, 8 table
Sparse Coding with Earth Mover's Distance for Multi-Instance Histogram Representation
Sparse coding (Sc) has been studied very well as a powerful data
representation method. It attempts to represent the feature vector of a data
sample by reconstructing it as the sparse linear combination of some basic
elements, and a norm distance function is usually used as the loss
function for the reconstruction error. In this paper, we investigate using Sc
as the representation method within multi-instance learning framework, where a
sample is given as a bag of instances, and further represented as a histogram
of the quantized instances. We argue that for the data type of histogram, using
norm distance is not suitable, and propose to use the earth mover's
distance (EMD) instead of norm distance as a measure of the
reconstruction error. By minimizing the EMD between the histogram of a sample
and the its reconstruction from some basic histograms, a novel sparse coding
method is developed, which is refereed as SC-EMD. We evaluate its performances
as a histogram representation method in tow multi-instance learning problems
--- abnormal image detection in wireless capsule endoscopy videos, and protein
binding site retrieval. The encouraging results demonstrate the advantages of
the new method over the traditional method using norm distance
Learning Invariant Color Features for Person Re-Identification
Matching people across multiple camera views known as person
re-identification, is a challenging problem due to the change in visual
appearance caused by varying lighting conditions. The perceived color of the
subject appears to be different with respect to illumination. Previous works
use color as it is or address these challenges by designing color spaces
focusing on a specific cue. In this paper, we propose a data driven approach
for learning color patterns from pixels sampled from images across two camera
views. The intuition behind this work is that, even though pixel values of same
color would be different across views, they should be encoded with the same
values. We model color feature generation as a learning problem by jointly
learning a linear transformation and a dictionary to encode pixel values. We
also analyze different photometric invariant color spaces. Using color as the
only cue, we compare our approach with all the photometric invariant color
spaces and show superior performance over all of them. Combining with other
learned low-level and high-level features, we obtain promising results in
ViPER, Person Re-ID 2011 and CAVIAR4REID datasets
A Classifier-guided Approach for Top-down Salient Object Detection
We propose a framework for top-down salient object detection that
incorporates a tightly coupled image classification module. The classifier is
trained on novel category-aware sparse codes computed on object dictionaries
used for saliency modeling. A misclassification indicates that the
corresponding saliency model is inaccurate. Hence, the classifier selects
images for which the saliency models need to be updated. The category-aware
sparse coding produces better image classification accuracy as compared to
conventional sparse coding with a reduced computational complexity. A
saliency-weighted max-pooling is proposed to improve image classification,
which is further used to refine the saliency maps. Experimental results on
Graz-02 and PASCAL VOC-07 datasets demonstrate the effectiveness of salient
object detection. Although the role of the classifier is to support salient
object detection, we evaluate its performance in image classification and also
illustrate the utility of thresholded saliency maps for image segmentation.Comment: To appear in Signal Processing: Image Communication, Elsevier.
Available online from April 201
Sparse Dictionary-based Attributes for Action Recognition and Summarization
We present an approach for dictionary learning of action attributes via
information maximization. We unify the class distribution and appearance
information into an objective function for learning a sparse dictionary of
action attributes. The objective function maximizes the mutual information
between what has been learned and what remains to be learned in terms of
appearance information and class distribution for each dictionary atom. We
propose a Gaussian Process (GP) model for sparse representation to optimize the
dictionary objective function. The sparse coding property allows a kernel with
compact support in GP to realize a very efficient dictionary learning process.
Hence we can describe an action video by a set of compact and discriminative
action attributes. More importantly, we can recognize modeled action categories
in a sparse feature space, which can be generalized to unseen and unmodeled
action categories. Experimental results demonstrate the effectiveness of our
approach in action recognition and summarization
Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks
In this paper, we introduce an end-to-end framework for video analysis
focused towards practical scenarios built on theoretical foundations from
sparse representation, including a novel descriptor for general purpose video
analysis. In our approach, we compute kinematic features from optical flow and
first and second-order derivatives of intensities to represent motion and
appearance respectively. These features are then used to construct covariance
matrices which capture joint statistics of both low-level motion and appearance
features extracted from a video. Using an over-complete dictionary of the
covariance based descriptors built from labeled training samples, we formulate
low-level event recognition as a sparse linear approximation problem. Within
this, we pose the sparse decomposition of a covariance matrix, which also
conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear
Riemannian manifolds, we compare our former approach with a sparse linear
approximation alternative that is suitable for equivalent vector spaces of
covariance matrices. This is done by searching for the best projection of the
query data on a dictionary using an Orthogonal Matching pursuit algorithm. We
show the applicability of our video descriptor in two different application
domains - namely low-level event recognition in unconstrained scenarios and
gesture recognition using one shot learning. Our experiments provide promising
insights in large scale video analysis
Parameterizing Region Covariance: An Efficient Way To Apply Sparse Codes On Second Order Statistics
Sparse representations have been successfully applied to signal processing,
computer vision and machine learning. Currently there is a trend to learn
sparse models directly on structure data, such as region covariance. However,
such methods when combined with region covariance often require complex
computation. We present an approach to transform a structured sparse model
learning problem to a traditional vectorized sparse modeling problem by
constructing a Euclidean space representation for region covariance matrices.
Our new representation has multiple advantages. Experiments on several vision
tasks demonstrate competitive performance with the state-of-the-art methods
Learning Discriminative Features via Label Consistent Neural Network
Deep Convolutional Neural Networks (CNN) enforces supervised information only
at the output layer, and hidden layers are trained by back propagating the
prediction error from the output layer without explicit supervision. We propose
a supervised feature learning approach, Label Consistent Neural Network, which
enforces direct supervision in late hidden layers. We associate each neuron in
a hidden layer with a particular class label and encourage it to be activated
for input signals from the same class. More specifically, we introduce a label
consistency regularization called "discriminative representation error" loss
for late hidden layers and combine it with classification error loss to build
our overall objective function. This label consistency constraint alleviates
the common problem of gradient vanishing and tends to faster convergence; it
also makes the features derived from late hidden layers discriminative enough
for classification even using a simple -NN classifier, since input signals
from the same class will have very similar representations. Experimental
results demonstrate that our approach achieves state-of-the-art performances on
several public benchmarks for action and object category recognition
Fast Algorithms for the Computation of Ranklets
Ranklets are orientation selective rank features with applications to tracking, face detection, texture and medical imaging. We introduce efficient algorithms that reduce their computational complexity from O(N logN) to O(!N + k), where N is the area of the filter. Timing tests show a speedup of
one order of magnitude for typical usage, which should make Ranklets attractive for real-time applications
Detection, Recognition and Tracking of Moving Objects from Real-time Video via Visual Vocabulary Model and Species Inspired PSO
In this paper, we address the basic problem of recognizing moving objects in
video images using Visual Vocabulary model and Bag of Words and track our
object of interest in the subsequent video frames using species inspired PSO.
Initially, the shadow free images are obtained by background modelling followed
by foreground modeling to extract the blobs of our object of interest.
Subsequently, we train a cubic SVM with human body datasets in accordance with
our domain of interest for recognition and tracking. During training, using the
principle of Bag of Words we extract necessary features of certain domains and
objects for classification. Subsequently, matching these feature sets with
those of the extracted object blobs that are obtained by subtracting the shadow
free background from the foreground, we detect successfully our object of
interest from the test domain. The performance of the classification by cubic
SVM is satisfactorily represented by confusion matrix and ROC curve reflecting
the accuracy of each module. After classification, our object of interest is
tracked in the test domain using species inspired PSO. By combining the
adaptive learning tools with the efficient classification of description, we
achieve optimum accuracy in recognition of the moving objects. We evaluate our
algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative
analysis of our algorithm against the existing state-of-the-art trackers shows
very satisfactory and competitive results
- …