4,147 research outputs found

    A Two-Layer Local Constrained Sparse Coding Method for Fine-Grained Visual Categorization

    Full text link
    Fine-grained categories are more difficulty distinguished than generic categories due to the similarity of inter-class and the diversity of intra-class. Therefore, the fine-grained visual categorization (FGVC) is considered as one of challenge problems in computer vision recently. A new feature learning framework, which is based on a two-layer local constrained sparse coding architecture, is proposed in this paper. The two-layer architecture is introduced for learning intermediate-level features, and the local constrained term is applied to guarantee the local smooth of coding coefficients. For extracting more discriminative information, local orientation histograms are the input of sparse coding instead of raw pixels. Moreover, a quick dictionary updating process is derived to further improve the training speed. Two experimental results show that our method achieves 85.29% accuracy on the Oxford 102 flowers dataset and 67.8% accuracy on the CUB-200-2011 bird dataset, and the performance of our framework is highly competitive with existing literatures.Comment: 19 pages, 12 figures, 8 table

    Sparse Coding with Earth Mover's Distance for Multi-Instance Histogram Representation

    Full text link
    Sparse coding (Sc) has been studied very well as a powerful data representation method. It attempts to represent the feature vector of a data sample by reconstructing it as the sparse linear combination of some basic elements, and a L2L_2 norm distance function is usually used as the loss function for the reconstruction error. In this paper, we investigate using Sc as the representation method within multi-instance learning framework, where a sample is given as a bag of instances, and further represented as a histogram of the quantized instances. We argue that for the data type of histogram, using L2L_2 norm distance is not suitable, and propose to use the earth mover's distance (EMD) instead of L2L_2 norm distance as a measure of the reconstruction error. By minimizing the EMD between the histogram of a sample and the its reconstruction from some basic histograms, a novel sparse coding method is developed, which is refereed as SC-EMD. We evaluate its performances as a histogram representation method in tow multi-instance learning problems --- abnormal image detection in wireless capsule endoscopy videos, and protein binding site retrieval. The encouraging results demonstrate the advantages of the new method over the traditional method using L2L_2 norm distance

    Learning Invariant Color Features for Person Re-Identification

    Full text link
    Matching people across multiple camera views known as person re-identification, is a challenging problem due to the change in visual appearance caused by varying lighting conditions. The perceived color of the subject appears to be different with respect to illumination. Previous works use color as it is or address these challenges by designing color spaces focusing on a specific cue. In this paper, we propose a data driven approach for learning color patterns from pixels sampled from images across two camera views. The intuition behind this work is that, even though pixel values of same color would be different across views, they should be encoded with the same values. We model color feature generation as a learning problem by jointly learning a linear transformation and a dictionary to encode pixel values. We also analyze different photometric invariant color spaces. Using color as the only cue, we compare our approach with all the photometric invariant color spaces and show superior performance over all of them. Combining with other learned low-level and high-level features, we obtain promising results in ViPER, Person Re-ID 2011 and CAVIAR4REID datasets

    A Classifier-guided Approach for Top-down Salient Object Detection

    Full text link
    We propose a framework for top-down salient object detection that incorporates a tightly coupled image classification module. The classifier is trained on novel category-aware sparse codes computed on object dictionaries used for saliency modeling. A misclassification indicates that the corresponding saliency model is inaccurate. Hence, the classifier selects images for which the saliency models need to be updated. The category-aware sparse coding produces better image classification accuracy as compared to conventional sparse coding with a reduced computational complexity. A saliency-weighted max-pooling is proposed to improve image classification, which is further used to refine the saliency maps. Experimental results on Graz-02 and PASCAL VOC-07 datasets demonstrate the effectiveness of salient object detection. Although the role of the classifier is to support salient object detection, we evaluate its performance in image classification and also illustrate the utility of thresholded saliency maps for image segmentation.Comment: To appear in Signal Processing: Image Communication, Elsevier. Available online from April 201

    Sparse Dictionary-based Attributes for Action Recognition and Summarization

    Full text link
    We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary atom. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes. More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition and summarization

    Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks

    Full text link
    In this paper, we introduce an end-to-end framework for video analysis focused towards practical scenarios built on theoretical foundations from sparse representation, including a novel descriptor for general purpose video analysis. In our approach, we compute kinematic features from optical flow and first and second-order derivatives of intensities to represent motion and appearance respectively. These features are then used to construct covariance matrices which capture joint statistics of both low-level motion and appearance features extracted from a video. Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem. Within this, we pose the sparse decomposition of a covariance matrix, which also conforms to the space of semi-positive definite matrices, as a determinant maximization problem. Also since covariance matrices lie on non-linear Riemannian manifolds, we compare our former approach with a sparse linear approximation alternative that is suitable for equivalent vector spaces of covariance matrices. This is done by searching for the best projection of the query data on a dictionary using an Orthogonal Matching pursuit algorithm. We show the applicability of our video descriptor in two different application domains - namely low-level event recognition in unconstrained scenarios and gesture recognition using one shot learning. Our experiments provide promising insights in large scale video analysis

    Parameterizing Region Covariance: An Efficient Way To Apply Sparse Codes On Second Order Statistics

    Full text link
    Sparse representations have been successfully applied to signal processing, computer vision and machine learning. Currently there is a trend to learn sparse models directly on structure data, such as region covariance. However, such methods when combined with region covariance often require complex computation. We present an approach to transform a structured sparse model learning problem to a traditional vectorized sparse modeling problem by constructing a Euclidean space representation for region covariance matrices. Our new representation has multiple advantages. Experiments on several vision tasks demonstrate competitive performance with the state-of-the-art methods

    Learning Discriminative Features via Label Consistent Neural Network

    Full text link
    Deep Convolutional Neural Networks (CNN) enforces supervised information only at the output layer, and hidden layers are trained by back propagating the prediction error from the output layer without explicit supervision. We propose a supervised feature learning approach, Label Consistent Neural Network, which enforces direct supervision in late hidden layers. We associate each neuron in a hidden layer with a particular class label and encourage it to be activated for input signals from the same class. More specifically, we introduce a label consistency regularization called "discriminative representation error" loss for late hidden layers and combine it with classification error loss to build our overall objective function. This label consistency constraint alleviates the common problem of gradient vanishing and tends to faster convergence; it also makes the features derived from late hidden layers discriminative enough for classification even using a simple kk-NN classifier, since input signals from the same class will have very similar representations. Experimental results demonstrate that our approach achieves state-of-the-art performances on several public benchmarks for action and object category recognition

    Fast Algorithms for the Computation of Ranklets

    Get PDF
    Ranklets are orientation selective rank features with applications to tracking, face detection, texture and medical imaging. We introduce efficient algorithms that reduce their computational complexity from O(N logN) to O(!N + k), where N is the area of the filter. Timing tests show a speedup of one order of magnitude for typical usage, which should make Ranklets attractive for real-time applications

    Detection, Recognition and Tracking of Moving Objects from Real-time Video via Visual Vocabulary Model and Species Inspired PSO

    Full text link
    In this paper, we address the basic problem of recognizing moving objects in video images using Visual Vocabulary model and Bag of Words and track our object of interest in the subsequent video frames using species inspired PSO. Initially, the shadow free images are obtained by background modelling followed by foreground modeling to extract the blobs of our object of interest. Subsequently, we train a cubic SVM with human body datasets in accordance with our domain of interest for recognition and tracking. During training, using the principle of Bag of Words we extract necessary features of certain domains and objects for classification. Subsequently, matching these feature sets with those of the extracted object blobs that are obtained by subtracting the shadow free background from the foreground, we detect successfully our object of interest from the test domain. The performance of the classification by cubic SVM is satisfactorily represented by confusion matrix and ROC curve reflecting the accuracy of each module. After classification, our object of interest is tracked in the test domain using species inspired PSO. By combining the adaptive learning tools with the efficient classification of description, we achieve optimum accuracy in recognition of the moving objects. We evaluate our algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative analysis of our algorithm against the existing state-of-the-art trackers shows very satisfactory and competitive results
    • …
    corecore