83 research outputs found

    Action Recognition in Video Using Sparse Coding and Relative Features

    Full text link
    This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by decomposing each video into a reduced set of representative atomic action acts or key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational Act Descriptor, that exploits the power of comparative reasoning to capture relative similarity relations among key-sequences. In terms of the method to obtain key-sequences, we introduce a loss function that, for each video, leads to the identification of a sparse set of representative key-frames capturing both, relevant particularities arising in the input video, as well as relevant generalities arising in the complete class collection. In terms of the method to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative intra and inter-class similarities among local temporal patterns arising in the videos. The resulting ITRA descriptor demonstrates to be highly effective to discriminate among action categories. As a result, the proposed approach reaches remarkable action recognition performance on several popular benchmark datasets, outperforming alternative state-of-the-art techniques by a large margin.Comment: Accepted to CVPR 201

    Sparse and low rank approximations for action recognition

    Get PDF
    Action recognition is crucial area of research in computer vision with wide range of applications in surveillance, patient-monitoring systems, video indexing, Human- Computer Interaction and many more. These applications require automated action recognition. Robust classification methods are sought-after despite influential research in this field over past decade. The data resources have grown tremendously owing to the advances in the digital revolution which cannot be compared to the meagre resources in the past. The main limitation on a system when dealing with video data is the computational burden due to large dimensions and data redundancy. Sparse and low rank approximation methods have evolved recently which aim at concise and meaningful representation of data. This thesis explores the application of sparse and low rank approximation methods in the context of video data classification with the following contributions. 1. An approach for solving the problem of action and gesture classification is proposed within the sparse representation domain, effectively dealing with large feature dimensions, 2. Low rank matrix completion approach is proposed to jointly classify more than one action 3. Deep features are proposed for robust classification of multiple actions within matrix completion framework which can handle data deficiencies. This thesis starts with the applicability of sparse representations based classifi- cation methods to the problem of action and gesture recognition. Random projection is used to reduce the dimensionality of the features. These are referred to as compressed features in this thesis. The dictionary formed with compressed features has proved to be efficient for the classification task achieving comparable results to the state of the art. Next, this thesis addresses the more promising problem of simultaneous classifi- cation of multiple actions. This is treated as matrix completion problem under transduction setting. Matrix completion methods are considered as the generic extension to the sparse representation methods from compressed sensing point of view. The features and corresponding labels of the training and test data are concatenated and placed as columns of a matrix. The unknown test labels would be the missing entries in that matrix. This is solved using rank minimization techniques based on the assumption that the underlying complete matrix would be a low rank one. This approach has achieved results better than the state of the art on datasets with varying complexities. This thesis then extends the matrix completion framework for joint classification of actions to handle the missing features besides missing test labels. In this context, deep features from a convolutional neural network are proposed. A convolutional neural network is trained on the training data and features are extracted from train and test data from the trained network. The performance of the deep features has proved to be promising when compared to the state of the art hand-crafted features

    Mathematically inspired approaches to face recognition in uncontrolled conditions: super resolution and compressive sensing

    Get PDF
    Face recognition systems under uncontrolled conditions using surveillance cameras is becom-ing essential for establishing the identity of a person at a distance from the camera and providing safety and security against terrorist, attack, robbery and crime. Therefore, the performance of face recognition in low-resolution degraded images with low quality against im-ages with high quality/and of good resolution/size is considered the most challenging tasks and constitutes focus of this thesis. The work in this thesis is designed to further investigate these issues and the following being our main aim: “To investigate face identification from a distance and under uncontrolled conditions by pri-marily addressing the problem of low-resolution images using existing/modified mathemati-cally inspired super resolution schemes that are based on the emerging new paradigm of compressive sensing and non-adaptive dictionaries based super resolution.” We shall firstly investigate and develop the compressive sensing (CS) based sparse represen-tation of a sample image to reconstruct a high-resolution image for face recognition, by tak-ing different approaches to constructing CS-compliant dictionaries such as Gaussian Random Matrix and Toeplitz Circular Random Matrix. In particular, our focus is on constructing CS non-adaptive dictionaries (independent of face image information), which contrasts with ex-isting image-learnt dictionaries, but satisfies some form of the Restricted Isometry Property (RIP) which is sufficient to comply with the CS theorem regarding the recovery of sparsely represented images. We shall demonstrate that the CS dictionary techniques for resolution enhancement tasks are able to develop scalable face recognition schemes under uncontrolled conditions and at a distance. Secondly, we shall clarify the comparisons of the strength of sufficient CS property for the various types of dictionaries and demonstrate that the image-learnt dictionary far from satisfies the RIP for compressive sensing. Thirdly, we propose dic-tionaries based on the high frequency coefficients of the training set and investigate the im-pact of using dictionaries on the space of feature vectors of the low-resolution image for face recognition when applied to the wavelet domain. Finally, we test the performance of the de-veloped schemes on CCTV images with unknown model of degradation, and show that these schemes significantly outperform existing techniques developed for such a challenging task. However, the performance is still not comparable to what could be achieved in controlled en-vironment, and hence we shall identify remaining challenges to be investigated in the future

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Sparsity-inducing dictionaries for effective action classification

    Get PDF
    Action recognition in unconstrained videos is one of the most important challenges in computer vision. In this paper, we propose sparsity-inducing dictionaries as an effective representation for action classification in videos. We demonstrate that features obtained from sparsity based representation provide discriminative information useful for classification of action videos into various action classes. We show that the constructed dictionaries are distinct for a large number of action classes resulting in a significant improvement in classification accuracy on the HMDB51 dataset. We further demonstrate the efficacy of dictionaries and sparsity based classification on other large action video datasets like UCF50

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201

    DESIGN OF COMPACT AND DISCRIMINATIVE DICTIONARIES

    Get PDF
    The objective of this research work is to design compact and discriminative dictionaries for e�ective classi�cation. The motivation stems from the fact that dictionaries inherently contain redundant dictionary atoms. This is because the aim of dictionary learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain minimum number discriminative dictionary atoms for e�ective classi�cation and also reduced computational time. First, we propose a classi�cation scheme where an example is assigned to a class based on the weight assigned to both maximum projection and minimum reconstruction error. Here, the input data is learned by K-SVD dictionary learning which alternates between sparse coding and dictionary update. For sparse coding, orthogonal matching pursuit (OMP) is used and for dictionary update, singular value decomposition is used. This way of classi�cation though e�ective, still there is a scope to improve dictionary learning by removing redundant atoms because our goal is not reconstruction. In order to remove such redundant atoms, we propose two approaches based on information theory to obtain compact discriminative dictionaries. In the �rst approach, we remove redundant atoms from the dictionary while maintaining discriminative information. Speci�cally, we propose a constraint optimization problem which minimizes the mutual information between optimized dictionary and initial dictionary while maximizing mutual information between class labels and optimized dictionary. This helps to determine information loss between before and after the dictionary optimization. To compute information loss, we use Jensen-Shannon diver- gence with adaptive weights to compare class distributions of each dictionary atom. The advantage of Jensen-Shannon divergence is its computational e�ciency rather than calculating information loss from mutual information
    corecore