239 research outputs found
Ensembles of Novel Visual Keywords Descriptors for Image Categorization
Object recognition systems need effective image descriptors to obtain good performance levels. Currently, the most widely used image descriptor is the SIFT descriptor that computes histograms of orientation gradients around points in an image. A possible problem of this approach is that the number of features becomes very large when a dense grid is used where the histograms are computed and combined for many different points. The current dominating solution to this problem is to use a clustering method to create a visual codebook that is exploited by an appearance based descriptor to create a histogram of visual keywords present in an image. In this paper we introduce several novel bag of visual keywords methods and compare them with the currently dominating hard bag-of-features (HBOF) approach that uses a hard assignment scheme to compute cluster frequencies. Furthermore, we combine all descriptors with a spatial pyramid and two ensemble classifiers. Experimental results on 10 and 101 classes of the Caltech-101 object database show that our novel methods significantly outperform the traditional HBOF approach and that our ensemble methods obtain state-of-the-art performance levels
Structured Learning of Tree Potentials in CRF for Image Segmentation
We propose a new approach to image segmentation, which exploits the
advantages of both conditional random fields (CRFs) and decision trees. In the
literature, the potential functions of CRFs are mostly defined as a linear
combination of some pre-defined parametric models, and then methods like
structured support vector machines (SSVMs) are applied to learn those linear
coefficients. We instead formulate the unary and pairwise potentials as
nonparametric forests---ensembles of decision trees, and learn the ensemble
parameters and the trees in a unified optimization problem within the
large-margin framework. In this fashion, we easily achieve nonlinear learning
of potential functions on both unary and pairwise terms in CRFs. Moreover, we
learn class-wise decision trees for each object that appears in the image. Due
to the rich structure and flexibility of decision trees, our approach is
powerful in modelling complex data likelihoods and label relationships. The
resulting optimization problem is very challenging because it can have
exponentially many variables and constraints. We show that this challenging
optimization can be efficiently solved by combining a modified column
generation and cutting-planes techniques. Experimental results on both binary
(Graz-02, Weizmann horse, Oxford flower) and multi-class (MSRC-21, PASCAL VOC
2012) segmentation datasets demonstrate the power of the learned nonlinear
nonparametric potentials.Comment: 10 pages. Appearing in IEEE Transactions on Neural Networks and
Learning System
Recommended from our members
Image features and learning algorithms for biological, generic and social object recognition
Automated recognition of object categories in images is a critical step for many real-world computer vision applications. Interest region detectors and region descriptors have been widely employed to tackle the variability of objects in pose, scale, lighting, texture, color, and so on. Different types of object recognition problems usually require different image features and corresponding learning algorithms. This dissertation focuses on the design, evaluation and application of new image features and learning algorithms for the recognition of biological, generic and social objects. The first part of the dissertation introduces a new structure-based interest region detector called the principal curvature-based region detector (PCBR) which detects stable watershed regions that are robust to local intensity perturbations. This detector is specifically designed for region detection for biological objects. Several recognition architectures are then developed that fuse visual information from disparate types of image features for the categorization of complex objects. The described image features and learning algorithms achieve excellent performance on the difficult stonefly larvae dataset. The second part of the dissertation presents studies of methods for visual codebook learning and their application to object recognition. The dissertation first introduces the methodology and application of generative visual codebooks for stonefly recognition and introduces a discriminative evaluation methodology based on a maximum mutual information criterion. Then a new generative/discriminative visual codebook learning algorithm, called iterative discriminative clustering (IDC), is presented that refines the centers and the shapes of the generative codewords for improved discriminative power. It is followed by a novel codebook learning algorithm that builds multiple codebooks that are non-redundant in discriminative power. All these visual codebook learning algorithms achieve high performance on both biological and generic object recognition tasks. The final part of the dissertation describes a socially-driven clothes recognition system for an intelligent fitting-room system. The dissertation presents the results of a user study to identify the key factors for clothes recognition. It then describes learning algorithms for recognizing these key factors from clothes images using various image features. The clothes recognition system successfully enables automated social fashion information retrieval for an enhanced clothes shopping experience
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?
In this paper, we evaluate the generalization power of deep features (ConvNets) in two new scenarios: aerial and remote sensing image classification. We evaluate experimentally ConvNets trained for recognizing everyday objects for the classification of aerial and remote sensing images. ConvNets obtained the best results for aerial images, while for remote sensing, they performed well but were outperformed by low-level color descriptors, such as BIC. We also present a correlation analysis, showing the potential for combining/fusing different ConvNets with other descriptors or even for combining multiple ConvNets. A preliminary set of experiments fusing ConvNets obtains state-of-the-art results for the well-known UCMerced dataset
- …