224 research outputs found
Ensembles of Novel Visual Keywords Descriptors for Image Categorization
Object recognition systems need effective image descriptors to obtain good performance levels. Currently, the most widely used image descriptor is the SIFT descriptor that computes histograms of orientation gradients around points in an image. A possible problem of this approach is that the number of features becomes very large when a dense grid is used where the histograms are computed and combined for many different points. The current dominating solution to this problem is to use a clustering method to create a visual codebook that is exploited by an appearance based descriptor to create a histogram of visual keywords present in an image. In this paper we introduce several novel bag of visual keywords methods and compare them with the currently dominating hard bag-of-features (HBOF) approach that uses a hard assignment scheme to compute cluster frequencies. Furthermore, we combine all descriptors with a spatial pyramid and two ensemble classifiers. Experimental results on 10 and 101 classes of the Caltech-101 object database show that our novel methods significantly outperform the traditional HBOF approach and that our ensemble methods obtain state-of-the-art performance levels
Recommended from our members
Image features and learning algorithms for biological, generic and social object recognition
Automated recognition of object categories in images is a critical step for many real-world computer vision applications. Interest region detectors and region descriptors have been widely employed to tackle the variability of objects in pose, scale, lighting, texture, color, and so on. Different types of object recognition problems usually require different image features and corresponding learning algorithms. This dissertation focuses on the design, evaluation and application of new image features and learning algorithms for the recognition of biological, generic and social objects. The first part of the dissertation introduces a new structure-based interest region detector called the principal curvature-based region detector (PCBR) which detects stable watershed regions that are robust to local intensity perturbations. This detector is specifically designed for region detection for biological objects. Several recognition architectures are then developed that fuse visual information from disparate types of image features for the categorization of complex objects. The described image features and learning algorithms achieve excellent performance on the difficult stonefly larvae dataset. The second part of the dissertation presents studies of methods for visual codebook learning and their application to object recognition. The dissertation first introduces the methodology and application of generative visual codebooks for stonefly recognition and introduces a discriminative evaluation methodology based on a maximum mutual information criterion. Then a new generative/discriminative visual codebook learning algorithm, called iterative discriminative clustering (IDC), is presented that refines the centers and the shapes of the generative codewords for improved discriminative power. It is followed by a novel codebook learning algorithm that builds multiple codebooks that are non-redundant in discriminative power. All these visual codebook learning algorithms achieve high performance on both biological and generic object recognition tasks. The final part of the dissertation describes a socially-driven clothes recognition system for an intelligent fitting-room system. The dissertation presents the results of a user study to identify the key factors for clothes recognition. It then describes learning algorithms for recognizing these key factors from clothes images using various image features. The clothes recognition system successfully enables automated social fashion information retrieval for an enhanced clothes shopping experience
AdaBoost 방법을 통해 학습된 SVM 분류기를 이용한 영상 분류
학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 유석인.This thesis presents the algorithm that categorizes images by objects contained in the images. The images are encoded with bag-of-features (BoF) model which represents an image as a collection of unordered features extracted from the local patches. To deal with the classification of multiple object categories, the one-versus-all method is applied for the implementation of multi-class classifier. The object classifiers are built as the number of object categories, and each classifier decides whether an image is included in the object category or not. The object classifier has been developed on the AdaBoost method. The object classifier is given by the weighted sum of 200 support vector machine (SVM) component classifiers. Among multiple object classifiers, the classifier with the highest output function value finally determines the category of the object image. The classification efficiency of the presented algorithm has been illustrated on the images from Caltech-101 dataset.Abstract i
Contents iii
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Image classification approaches . . . . . . . . . . . 3
2.2 Boosting methods . . . . . . . . . . . . . . . 6
2.3 Background . . . . . . . . . . . . . . . . . 9
2.3.1 Support vector machine . . . . . . . . . . . . . 9
Chapter 3 Proposed Algorithm 12
3.1 SIFT feature extraction . . . . . . . . . . . . . 13
3.2 Codebook construction . . . . . . . . . . . . . 15
3.3 Bag-of-features representation . . . . . . . . . . . 16
3.4 Classifier design . . . . . . . . . . . . . . . 16
Chapter 4 Experiments 20
4.1 Dataset . . . . . . . . . . . . . . . . . . 20
4.2 Bag-of-features representation . . . . . . . . . . . 22
4.3 Classifiers . . . . . . . . . . . . . . . . . 24
4.4 Classification results . . . . . . . . . . . . . . 25
Chapter 5 Conclusion 29
Bibliography 30
Abstract in Korean 34Maste
Insights from Classifying Visual Concepts with Multiple Kernel Learning
Combining information from various image features has become a standard
technique in concept recognition tasks. However, the optimal way of fusing the
resulting kernel functions is usually unknown in practical applications.
Multiple kernel learning (MKL) techniques allow to determine an optimal linear
combination of such similarity matrices. Classical approaches to MKL promote
sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often
observed to be outperformed by an unweighted sum kernel. The contribution of
this paper is twofold: We apply a recently developed non-sparse MKL variant to
state-of-the-art concept recognition tasks within computer vision. We provide
insights on benefits and limits of non-sparse MKL and compare it against its
direct competitors, the sum kernel SVM and the sparse MKL. We report empirical
results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo
Annotation challenge data sets. About to be submitted to PLoS ONE.Comment: 18 pages, 8 tables, 4 figures, format deviating from plos one
submission format requirements for aesthetic reason
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
- …