83 research outputs found
Action Recognition in Video Using Sparse Coding and Relative Features
This work presents an approach to category-based action recognition in video
using sparse coding techniques. The proposed approach includes two main
contributions: i) A new method to handle intra-class variations by decomposing
each video into a reduced set of representative atomic action acts or
key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational
Act Descriptor, that exploits the power of comparative reasoning to capture
relative similarity relations among key-sequences. In terms of the method to
obtain key-sequences, we introduce a loss function that, for each video, leads
to the identification of a sparse set of representative key-frames capturing
both, relevant particularities arising in the input video, as well as relevant
generalities arising in the complete class collection. In terms of the method
to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative
intra and inter-class similarities among local temporal patterns arising in the
videos. The resulting ITRA descriptor demonstrates to be highly effective to
discriminate among action categories. As a result, the proposed approach
reaches remarkable action recognition performance on several popular benchmark
datasets, outperforming alternative state-of-the-art techniques by a large
margin.Comment: Accepted to CVPR 201
Sparse and low rank approximations for action recognition
Action recognition is crucial area of research in computer vision with wide range of
applications in surveillance, patient-monitoring systems, video indexing, Human-
Computer Interaction and many more. These applications require automated
action recognition. Robust classification methods are sought-after despite influential
research in this field over past decade. The data resources have grown
tremendously owing to the advances in the digital revolution which cannot be
compared to the meagre resources in the past. The main limitation on a system
when dealing with video data is the computational burden due to large dimensions
and data redundancy. Sparse and low rank approximation methods have evolved
recently which aim at concise and meaningful representation of data. This thesis
explores the application of sparse and low rank approximation methods in the
context of video data classification with the following contributions.
1. An approach for solving the problem of action and gesture classification is
proposed within the sparse representation domain, effectively dealing with
large feature dimensions,
2. Low rank matrix completion approach is proposed to jointly classify more
than one action
3. Deep features are proposed for robust classification of multiple actions
within matrix completion framework which can handle data deficiencies.
This thesis starts with the applicability of sparse representations based classifi-
cation methods to the problem of action and gesture recognition. Random projection
is used to reduce the dimensionality of the features. These are referred
to as compressed features in this thesis. The dictionary formed with compressed
features has proved to be efficient for the classification task achieving comparable
results to the state of the art.
Next, this thesis addresses the more promising problem of simultaneous classifi-
cation of multiple actions. This is treated as matrix completion problem under
transduction setting. Matrix completion methods are considered as the generic
extension to the sparse representation methods from compressed sensing point
of view. The features and corresponding labels of the training and test data are
concatenated and placed as columns of a matrix. The unknown test labels would
be the missing entries in that matrix. This is solved using rank minimization
techniques based on the assumption that the underlying complete matrix would
be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities.
This thesis then extends the matrix completion framework for joint classification
of actions to handle the missing features besides missing test labels. In
this context, deep features from a convolutional neural network are proposed.
A convolutional neural network is trained on the training data and features are
extracted from train and test data from the trained network. The performance
of the deep features has proved to be promising when compared to the state of
the art hand-crafted features
Mathematically inspired approaches to face recognition in uncontrolled conditions: super resolution and compressive sensing
Face recognition systems under uncontrolled conditions using surveillance cameras is becom-ing essential for establishing the identity of a person at a distance from the camera and providing safety and security against terrorist, attack, robbery and crime. Therefore, the performance of face recognition in low-resolution degraded images with low quality against im-ages with high quality/and of good resolution/size is considered the most challenging tasks and constitutes focus of this thesis. The work in this thesis is designed to further investigate these issues and the following being our main aim:
“To investigate face identification from a distance and under uncontrolled conditions by pri-marily addressing the problem of low-resolution images using existing/modified mathemati-cally inspired super resolution schemes that are based on the emerging new paradigm of compressive sensing and non-adaptive dictionaries based super resolution.”
We shall firstly investigate and develop the compressive sensing (CS) based sparse represen-tation of a sample image to reconstruct a high-resolution image for face recognition, by tak-ing different approaches to constructing CS-compliant dictionaries such as Gaussian Random Matrix and Toeplitz Circular Random Matrix. In particular, our focus is on constructing CS non-adaptive dictionaries (independent of face image information), which contrasts with ex-isting image-learnt dictionaries, but satisfies some form of the Restricted Isometry Property (RIP) which is sufficient to comply with the CS theorem regarding the recovery of sparsely represented images. We shall demonstrate that the CS dictionary techniques for resolution enhancement tasks are able to develop scalable face recognition schemes under uncontrolled conditions and at a distance. Secondly, we shall clarify the comparisons of the strength of sufficient CS property for the various types of dictionaries and demonstrate that the image-learnt dictionary far from satisfies the RIP for compressive sensing. Thirdly, we propose dic-tionaries based on the high frequency coefficients of the training set and investigate the im-pact of using dictionaries on the space of feature vectors of the low-resolution image for face recognition when applied to the wavelet domain. Finally, we test the performance of the de-veloped schemes on CCTV images with unknown model of degradation, and show that these schemes significantly outperform existing techniques developed for such a challenging task. However, the performance is still not comparable to what could be achieved in controlled en-vironment, and hence we shall identify remaining challenges to be investigated in the future
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Sparsity-inducing dictionaries for effective action classification
Action recognition in unconstrained videos is one of the most important challenges in computer vision. In this paper, we propose sparsity-inducing dictionaries as an effective representation for action classification in videos. We demonstrate that features obtained from sparsity based representation provide discriminative information useful for classification of action videos into various action classes. We show that the constructed dictionaries are distinct for a large number of action classes resulting in a significant improvement in classification accuracy on the HMDB51 dataset. We further demonstrate the efficacy of dictionaries and sparsity based classification on other large action video datasets like UCF50
Sparse representation based hyperspectral image compression and classification
Abstract
This thesis presents a research work on applying sparse representation to lossy hyperspectral image
compression and hyperspectral image classification. The proposed lossy hyperspectral image
compression framework introduces two types of dictionaries distinguished by the terms sparse
representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively.
The former is learnt in the spectral domain to exploit the spectral correlations, and the
latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in
hyperspectral images. To alleviate the computational demand of dictionary learning, either a
base dictionary trained offline or an update of the base dictionary is employed in the compression
framework. The proposed compression method is evaluated in terms of different objective
metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including
JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of
both SRSD and MSSD approaches.
For the proposed hyperspectral image classification method, we utilize the sparse coefficients
for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular,
the discriminative character of the sparse coefficients is enhanced by incorporating contextual
information using local mean filters. The classification performance is evaluated and compared
to a number of similar or representative methods. The results show that our approach could outperform
other approaches based on SVM or sparse representation.
This thesis makes the following contributions. It provides a relatively thorough investigation
of applying sparse representation to lossy hyperspectral image compression. Specifically,
it reveals the effectiveness of sparse representation for the exploitation of spectral correlations
in hyperspectral images. In addition, we have shown that the discriminative character of sparse
coefficients can lead to superior performance in hyperspectral image classification.EM201
DESIGN OF COMPACT AND DISCRIMINATIVE DICTIONARIES
The objective of this research work is to design compact and discriminative dictionaries
for e�ective classi�cation. The motivation stems from the fact that dictionaries
inherently contain redundant dictionary atoms. This is because the aim of dictionary
learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain
minimum number discriminative dictionary atoms for e�ective classi�cation and
also reduced computational time.
First, we propose a classi�cation scheme where an example is assigned to a class
based on the weight assigned to both maximum projection and minimum reconstruction
error. Here, the input data is learned by K-SVD dictionary learning which alternates
between sparse coding and dictionary update. For sparse coding, orthogonal
matching pursuit (OMP) is used and for dictionary update, singular value decomposition
is used. This way of classi�cation though e�ective, still there is a scope to
improve dictionary learning by removing redundant atoms because our goal is not reconstruction.
In order to remove such redundant atoms, we propose two approaches
based on information theory to obtain compact discriminative dictionaries. In the
�rst approach, we remove redundant atoms from the dictionary while maintaining
discriminative information. Speci�cally, we propose a constraint optimization problem
which minimizes the mutual information between optimized dictionary and initial
dictionary while maximizing mutual information between class labels and optimized
dictionary. This helps to determine information loss between before and after the
dictionary optimization. To compute information loss, we use Jensen-Shannon diver-
gence with adaptive weights to compare class distributions of each dictionary atom.
The advantage of Jensen-Shannon divergence is its computational e�ciency rather
than calculating information loss from mutual information
- …