42 research outputs found

    Objects classification in still images using the region covariance descriptor

    Get PDF
    The goal of the Object Classification is to classify the objects in images. Classification aims for the recognition of generic classes, which is also known as Generic Object Recognition. This is quite different from Specific Object Recognition, such as recognizing specific person, own car, and etc. Human beings are generally better in recognizing generic classes than specific objects. Classification is a much harder problem to solve by artificial systems. Classification algorithm must be robust to changes in illumination, object scale, view point, and etc. The algorithm also has to manage large intra class variations and small inter class variations. In recent literature, some of the classification methods use Bag of Visual Words model. In this work the main emphasis is on region descriptor and representation of training images. Given a set of training images, interest points are detected through interest point detectors. Region around an interest point is described by a descriptor. Region covariance descriptor is adopted from porikli et al. [21], where they used this descriptor for object detection and classification. This region covariance descriptor is combined with Bag of Visual words model. We have used a different set of features for Classification task. Covariance of d-features, e.g. spatial location, Gaussian kernel with three different s values, first order Gaussian derivatives with two different s values, and second order Gaussian derivatives with four different s values, characterizes a region of interest. An image is also represented by Bag of Visual words obtained with both SIFT and Covariance descriptors. We worked on five datasets; Caltech-4, Caltech-3, Animal, Caltech-10, and Flower (17 classes), with first four taken from Caltech-256 and Caltech-101 datasets. Many researchers used Caltech-4 dataset for object classification task. The region covariance descriptor is outperforming SIFT descriptor on both Caltech-4 and Caltech-3 datasets, whereas Combined representation (SIFT + Covariance) is outperforming both SIFT and Covarianc

    Image Reconstruction from Bag-of-Visual-Words

    Full text link
    The objective of this work is to reconstruct an original image from Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means of identifying the characteristics of features. Additionally, it enables us to generate novel images via features. Although BoVW is the de facto standard feature for image recognition and retrieval, successful image reconstruction from BoVW has not been reported yet. What complicates this task is that BoVW lacks the spatial information for including visual words. As described in this paper, to estimate an original arrangement, we propose an evaluation function that incorporates the naturalness of local adjacency and the global position, with a method to obtain related parameters using an external image database. To evaluate the performance of our method, we reconstruct images of objects of 101 kinds. Additionally, we apply our method to analyze object classifiers and to generate novel images via BoVW

    Class Representative Visual Words for Category-Level Object Recognition

    Full text link
    Recent works in object recognition often use visual words, i.e. vector quantized local descriptors extracted from the images. In this paper we present a novel method to build such a codebook with class representative vectors. This method, coined Cluster Precision Maximization (CPM), is based on a new measure of the cluster precision and on an optimization procedure that leads any clustering algorithm towards class representative visual words. We compare our procedure with other measures of cluster precision and present the integration of a Reciprocal Nearest Neighbor (RNN) clustering algorithm in the CPM method. In the experiments, on a subset of the the Caltech101 database, we analyze several vocabularies obtained with different local descriptors and different clustering algorithms, and we show that the vocabularies obtained with the CPM process perform best in a category-level object recognition system using a Support Vector Machine (SVM). 漏 2009 Springer Berlin Heidelberg.L贸pez Sastre R.J., Tuytelaars T., Maldonado Basc贸n S., ''Class representative visual words for category-level object recognition'', Lecture notes in computer science, vol. 5524, 2009 (4th Iberian conference on pattern recognition and image analysis - IbPRAI 2009, June 10-12, 2009, P贸voa de Varzim, Portugal).status: publishe

    Human Action Recognition Using Pyramid Vocabulary Tree

    Full text link
    Abstract. The bag-of-visual-words (BOVW) approaches are widely used in human action recognition. Usually, large vocabulary size of the BOVW is more discriminative for inter-class action classification while small one is more robust to noise and thus tolerant to the intra-class invariance. In this pape, we propose a pyramid vocabulary tree to model local spatio-temporal features, which can characterize the inter-class difference and also allow intra-class variance. Moreover, since BOVW is geometrically unconstrained, we further consider the spatio-temporal information of local features and propose a sparse spatio-temporal pyramid matching kernel (termed as SST-PMK) to compute the similarity measures between video sequences. SST-PMK satisfies the Mercer鈥檚 condition and therefore is readily integrated into SVM to perform action recognition. Experimental results on the Weizmann datasets show that both the pyramid vocabulary tree and the SST-PMK lead to a significant improvement in human action recognition. Keywords: Action recognition, Bag-of-visual-words (BOVW), Pyramid matching kernel (PMK
    corecore