134,950 research outputs found
Class Representative Visual Words for Category-Level Object Recognition
Recent works in object recognition often use visual words, i.e. vector quantized local descriptors extracted from the images. In this paper we present a novel method to build such a codebook with class representative vectors. This method, coined Cluster Precision Maximization (CPM), is based on a new measure of the cluster precision and on an optimization procedure that leads any clustering algorithm towards class representative visual words. We compare our procedure with other measures of cluster precision and present the integration of a Reciprocal Nearest Neighbor (RNN) clustering algorithm in the CPM method. In the experiments, on a subset of the the Caltech101 database, we analyze several vocabularies obtained with different local descriptors and different clustering algorithms, and we show that the vocabularies obtained with the CPM process perform best in a category-level object recognition system using a Support Vector Machine (SVM). © 2009 Springer Berlin Heidelberg.López Sastre R.J., Tuytelaars T., Maldonado Bascón S., ''Class representative visual words for category-level object recognition'', Lecture notes in computer science, vol. 5524, 2009 (4th Iberian conference on pattern recognition and image analysis - IbPRAI 2009, June 10-12, 2009, Póvoa de Varzim, Portugal).status: publishe
Visual vocabularies for category-level object recognition
This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection
Visual vocabularies for category-level object recognition
This thesis focuses on the study of visual vocabularies for category-level object recognition. Specifically, we state novel approaches for building visual codebooks. Our aim is not just to obtain more discriminative and more compact visual codebooks, but to bridge the gap between visual features and semantic concepts. A novel approach for obtaining class representative visual words is presented. It is based on a maximisation procedure, i. e. the Cluster Precision Maximisation (CPM), of a novel cluster precision criterion, and on an adaptive threshold refinement scheme for agglomerative clustering algorithms based on correlation clustering techniques. The objective is to increase the vocabulary compactness while at the same time improve the recognition rate and further increase the representativeness of the visual words. Moreover, we describe a novel clustering aggregation based approach for building efficient and semantic visual vocabularies. It consist of a novel framework for incorporating neighboring appearances of local descriptors into the vocabulary construction, and a rigorous approach for adding meaningful spatial coherency among the local features into the visual codebooks. We also propose an efficient high-dimensional data clustering algorithm, the Fast Reciprocal Nearest Neighbours (Fast-RNN). Our approach, which is a speeded up version of the standard RNN algorithm, is based on the projection search paradigm. Finally, we release a new database of images called Image Collection of Annotated Real-world Objects (ICARO), which is especially designed for evaluating category-level object recognition systems. An exhaustive comparison of ICARO with other well-known datasets used within the same context is carried out. We also propose a benchmark for both object classification and detection
Mid-level Deep Pattern Mining
Mid-level visual element discovery aims to find clusters of image patches
that are both representative and discriminative. In this work, we study this
problem from the prospective of pattern mining while relying on the recently
popularized Convolutional Neural Networks (CNNs). Specifically, we find that
for an image patch, activations extracted from the first fully-connected layer
of CNNs have two appealing properties which enable its seamless integration
with pattern mining. Patterns are then discovered from a large number of CNN
activations of image patches through the well-known association rule mining.
When we retrieve and visualize image patches with the same pattern,
surprisingly, they are not only visually similar but also semantically
consistent. We apply our approach to scene and object classification tasks, and
demonstrate that our approach outperforms all previous works on mid-level
visual element discovery by a sizeable margin with far fewer elements being
used. Our approach also outperforms or matches recent works using CNN for these
tasks. Source code of the complete system is available online.Comment: Published in Proc. IEEE Conf. Computer Vision and Pattern Recognition
201
Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation
The task of a visual landmark recognition system is to identify photographed
buildings or objects in query photos and to provide the user with relevant
information on them. With their increasing coverage of the world's landmark
buildings and objects, Internet photo collections are now being used as a
source for building such systems in a fully automatic fashion. This process
typically consists of three steps: clustering large amounts of images by the
objects they depict; determining object names from user-provided tags; and
building a robust, compact, and efficient recognition index. To this date,
however, there is little empirical information on how well current approaches
for those steps perform in a large-scale open-set mining and recognition task.
Furthermore, there is little empirical information on how recognition
performance varies for different types of landmark objects and where there is
still potential for improvement. With this paper, we intend to fill these gaps.
Using a dataset of 500k images from Paris, we analyze each component of the
landmark recognition pipeline in order to answer the following questions: How
many and what kinds of objects can be discovered automatically? How can we best
use the resulting image clusters to recognize the object in a query? How can
the object be efficiently represented in memory for recognition? How reliably
can semantic information be extracted? And finally: What are the limiting
factors in the resulting pipeline from query to semantics? We evaluate how
different choices of methods and parameters for the individual pipeline steps
affect overall system performance and examine their effects for different query
categories such as buildings, paintings or sculptures
- …