261 research outputs found

    Objects classification in still images using the region covariance descriptor

    Get PDF
    The goal of the Object Classification is to classify the objects in images. Classification aims for the recognition of generic classes, which is also known as Generic Object Recognition. This is quite different from Specific Object Recognition, such as recognizing specific person, own car, and etc. Human beings are generally better in recognizing generic classes than specific objects. Classification is a much harder problem to solve by artificial systems. Classification algorithm must be robust to changes in illumination, object scale, view point, and etc. The algorithm also has to manage large intra class variations and small inter class variations. In recent literature, some of the classification methods use Bag of Visual Words model. In this work the main emphasis is on region descriptor and representation of training images. Given a set of training images, interest points are detected through interest point detectors. Region around an interest point is described by a descriptor. Region covariance descriptor is adopted from porikli et al. [21], where they used this descriptor for object detection and classification. This region covariance descriptor is combined with Bag of Visual words model. We have used a different set of features for Classification task. Covariance of d-features, e.g. spatial location, Gaussian kernel with three different s values, first order Gaussian derivatives with two different s values, and second order Gaussian derivatives with four different s values, characterizes a region of interest. An image is also represented by Bag of Visual words obtained with both SIFT and Covariance descriptors. We worked on five datasets; Caltech-4, Caltech-3, Animal, Caltech-10, and Flower (17 classes), with first four taken from Caltech-256 and Caltech-101 datasets. Many researchers used Caltech-4 dataset for object classification task. The region covariance descriptor is outperforming SIFT descriptor on both Caltech-4 and Caltech-3 datasets, whereas Combined representation (SIFT + Covariance) is outperforming both SIFT and Covarianc

    Object Edge Contour Localisation Based on HexBinary Feature Matching

    Get PDF
    This paper addresses the issue of localising object edge contours in cluttered backgrounds to support robotics tasks such as grasping and manipulation and also to improve the potential perceptual capabilities of robot vision systems. Our approach is based on coarse-to-fine matching of a new recursively constructed hierarchical, dense, edge-localised descriptor, the HexBinary, based on the HexHog descriptor structure first proposed in [1]. Since Binary String image descriptors [2]– [5] require much lower computational resources, but provide similar or even better matching performance than Histogram of Orientated Gradient (HoG) descriptors, we have replaced the HoG base descriptor fields used in HexHog with Binary Strings generated from first and second order polar derivative approximations. The ALOI [6] dataset is used to evaluate the HexBinary descriptors which we demonstrate to achieve a superior performance to that of HexHoG [1] for pose refinement. The validation of our object contour localisation system shows promising results with correctly labelling ~86% of edgel positions and mis-labelling ~3%

    From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips

    Full text link
    Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilise semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlablled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilise a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.Comment: 9 pages, GCPR, 201

    Hybrid image representation methods for automatic image annotation: a survey

    Get PDF
    In most automatic image annotation systems, images are represented with low level features using either global methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is beneficial in annotating images. In this paper, we provide a survey on automatic image annotation techniques according to one aspect: feature extraction, and, in order to complement existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
    • 

    corecore