874 research outputs found
Convolutional Color Constancy
Color constancy is the problem of inferring the color of the light that
illuminated a scene, usually so that the illumination color can be removed.
Because this problem is underconstrained, it is often solved by modeling the
statistical regularities of the colors of natural objects and illumination. In
contrast, in this paper we reformulate the problem of color constancy as a 2D
spatial localization task in a log-chrominance space, thereby allowing us to
apply techniques from object detection and structured prediction to the color
constancy problem. By directly learning how to discriminate between correctly
white-balanced images and poorly white-balanced images, our model is able to
improve performance on standard benchmarks by nearly 40%
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
Computer vision tasks are traditionally defined and evaluated using semantic
categories. However, it is known to the field that semantic classes do not
necessarily correspond to a unique visual class (e.g. inside and outside of a
car). Furthermore, many of the feasible learning techniques at hand cannot
model a visual class which appears consistent to the human eye. These problems
have motivated the use of 1) Unsupervised or supervised clustering as a
preprocessing step to identify the visual subclasses to be used in a
mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other
works model mixture assignment with latent variables which is optimized during
learning 3) Highly non-linear classifiers which are inherently capable of
modelling multi-modal input space but are inefficient at the test time. In this
work, we promote an incremental view over the recognition of semantic classes
with varied appearances. We propose an optimization technique which
incrementally finds maximal visual subclasses in a regularized risk
minimization framework. Our proposed approach unifies the clustering and
classification steps in a single algorithm. The importance of this approach is
its compliance with the classification via the fact that it does not need to
know about the number of clusters, the representation and similarity measures
used in pre-processing clustering methods a priori. Following this approach we
show both qualitatively and quantitatively significant results. We show that
the visual subclasses demonstrate a long tail distribution. Finally, we show
that state of the art object detection methods (e.g. DPM) are unable to use the
tails of this distribution comprising 50\% of the training samples. In fact we
show that DPM performance slightly increases on average by the removal of this
half of the data.Comment: Updated ICCV 2013 submissio
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
Deformable Part Models are Convolutional Neural Networks
Deformable part models (DPMs) and convolutional neural networks (CNNs) are
two widely used tools for visual recognition. They are typically viewed as
distinct approaches: DPMs are graphical models (Markov random fields), while
CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM
can be formulated as a CNN, thus providing a novel synthesis of the two ideas.
Our construction involves unrolling the DPM inference algorithm and mapping
each step to an equivalent (and at times novel) CNN layer. From this
perspective, it becomes natural to replace the standard image features used in
DPM with a learned feature extractor. We call the resulting model DeepPyramid
DPM and experimentally validate it on PASCAL VOC. DeepPyramid DPM significantly
outperforms DPMs based on histograms of oriented gradients features (HOG) and
slightly outperforms a comparable version of the recently introduced R-CNN
detection system, while running an order of magnitude faster
- …