46 research outputs found
Learning and Using Taxonomies For Fast Visual Categorization
The computational complexity of current visual categorization algorithms scales linearly at best with the number of categories. The goal of classifying simultaneously N_(cat) = 10^4 - 10^5 visual categories requires sub-linear classification costs. We explore algorithms for automatically building classification trees which have, in principle, log N_(cat) complexity. We find that a greedy algorithm that recursively splits the set of categories into the two minimally confused subsets achieves 5-20 fold speedups at a small cost in classification performance. Our approach is independent of the specific classification algorithm used. A welcome by-product of our algorithm is a very reasonable taxonomy of the Caltech-256 dataset
Hallucinating optimal high-dimensional subspaces
Linear subspace representations of appearance variation are pervasive in
computer vision. This paper addresses the problem of robustly matching such
subspaces (computing the similarity between them) when they are used to
describe the scope of variations within sets of images of different (possibly
greatly so) scales. A naive solution of projecting the low-scale subspace into
the high-scale image space is described first and subsequently shown to be
inadequate, especially at large scale discrepancies. A successful approach is
proposed instead. It consists of (i) an interpolated projection of the
low-scale subspace into the high-scale space, which is followed by (ii) a
rotation of this initial estimate within the bounds of the imposed
``downsampling constraint''. The optimal rotation is found in the closed-form
which best aligns the high-scale reconstruction of the low-scale subspace with
the reference it is compared to. The method is evaluated on the problem of
matching sets of (i) face appearances under varying illumination and (ii)
object appearances under varying viewpoint, using two large data sets. In
comparison to the naive matching, the proposed algorithm is shown to greatly
increase the separation of between-class and within-class similarities, as well
as produce far more meaningful modes of common appearance on which the match
score is based.Comment: Pattern Recognition, 201
The Devil is in the Tails: Fine-grained Classification in the Wild
The world is long-tailed. What does this mean for computer vision and visual
recognition? The main two implications are (1) the number of categories we need
to consider in applications can be very large, and (2) the number of training
examples for most categories can be very small. Current visual recognition
algorithms have achieved excellent classification accuracy. However, they
require many training examples to reach peak performance, which suggests that
long-tailed distributions will not be dealt with well. We analyze this question
in the context of eBird, a large fine-grained classification dataset, and a
state-of-the-art deep network classification algorithm. We find that (a) peak
classification performance on well-represented categories is excellent, (b)
given enough data, classification performance suffers only minimally from an
increase in the number of classes, (c) classification performance decays
precipitously as the number of training examples decreases, (d) surprisingly,
transfer learning is virtually absent in current methods. Our findings suggest
that our community should come to grips with the question of long tails
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
Recognition from appearance subspaces across image sets of variable scale
Linear subspace representations of appearance variation are pervasive in computer vision. In this paper we address the problem of robustly matching them (computing the similarity between them) when they correspond to sets of images of different (possibly greatly so) scales. We show that the naïve solution of projecting the low-scale subspace into the high-scale image space is inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed “downsampling constraint”. The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The proposed method is evaluated on the problem of matching sets of face appearances under varying illumination. In comparison to the naïve matching, our algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based
The Impact of Technology in Orientation Aid
Statistic states that 285 million people are estimated to be visually impaired worldwide: 39 million are blind and 246 have low vision. About 90% of the world\u27s visually impaired people live in developing countries. Taking in consideration that Mechatronics is a methodology used for the optimal design of electromechanical products, and by combining technologies that are available to us we can develop a very useful tool that blind people and people with sight problems can change their lives. Combining smart phones and digital camera there are possibilities to build smart glasses which will give information to blind people. In this paper definitely a new approach for making peoples life easy is proposed. Initially the results are reached from simulation using Matlab/SIMULINK package which will lead this research to real time experimental results
Quantifying and Transferring Contextual Information in Object Detection
(c) 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other work
Comparison of Visual Datasets for Machine Learning
One of the greatest technological improvements in recent years is the rapid progress using machine learning for processing visual data. Among all factors that contribute to this development, datasets with labels play crucial roles. Several datasets are widely reused for investigating and analyzing different solutions in machine learning. Many systems, such as autonomous vehicles, rely on components using machine learning for recognizing objects. This paper compares different visual datasets and frameworks for machine learning. The comparison is both qualitative and quantitative and investigates object detection labels with respect to size, location, and contextual information. This paper also presents a new approach creating datasets using real-time, geo-tagged visual data, greatly improving the contextual information of the data. The data could be automatically labeled by cross-referencing information from other sources (such as weather)