22 research outputs found
An Empirical Evaluation of Current Convolutional Architectures' Ability to Manage Nuisance Location and Scale Variability
We conduct an empirical study to test the ability of Convolutional Neural
Networks (CNNs) to reduce the effects of nuisance transformations of the input
data, such as location, scale and aspect ratio. We isolate factors by adopting
a common convolutional architecture either deployed globally on the image to
compute class posterior distributions, or restricted locally to compute class
conditional distributions given location, scale and aspect ratios of bounding
boxes determined by proposal heuristics. In theory, averaging the latter should
yield inferior performance compared to proper marginalization. Yet empirical
evidence suggests the converse, leading us to conclude that - at the current
level of complexity of convolutional architectures and scale of the data sets
used to train them - CNNs are not very effective at marginalizing nuisance
variability. We also quantify the effects of context on the overall
classification task and its impact on the performance of CNNs, and propose
improved sampling techniques for heuristic proposal schemes that improve
end-to-end performance to state-of-the-art levels. We test our hypothesis on a
classification task using the ImageNet Challenge benchmark and on a
wide-baseline matching task using the Oxford and Fischer's datasets.Comment: 10 pages, 5 figures, 3 tables -- CVPR 2016, camera-ready versio
Self-taught Object Localization with Deep Networks
This paper introduces self-taught object localization, a novel approach that
leverages deep convolutional networks trained for whole-image recognition to
localize objects in images without additional human supervision, i.e., without
using any ground-truth bounding boxes for training. The key idea is to analyze
the change in the recognition scores when artificially masking out different
regions of the image. The masking out of a region that includes the object
typically causes a significant drop in recognition score. This idea is embedded
into an agglomerative clustering technique that generates self-taught
localization hypotheses. Our object localization scheme outperforms existing
proposal methods in both precision and recall for small number of subwindow
proposals (e.g., on ILSVRC-2012 it produces a relative gain of 23.4% over the
state-of-the-art for top-1 hypothesis). Furthermore, our experiments show that
the annotations automatically-generated by our method can be used to train
object detectors yielding recognition results remarkably close to those
obtained by training on manually-annotated bounding boxes.Comment: WACV 201
What makes for effective detection proposals?
Current top performing object detectors employ detection proposals to guide
the search for objects, thereby avoiding exhaustive sliding window search
across images. Despite the popularity and widespread use of detection
proposals, it is unclear which trade-offs are made when using them during
object detection. We provide an in-depth analysis of twelve proposal methods
along with four baselines regarding proposal repeatability, ground truth
annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM,
R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object
detection improving proposal localisation accuracy is as important as improving
recall. We introduce a novel metric, the average recall (AR), which rewards
both high recall and good localisation and correlates surprisingly well with
detection performance. Our findings show common strengths and weaknesses of
existing methods, and provide insights and metrics for selecting and tuning
proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
Sequential Optimization for Efficient High-Quality Object Proposal Generation
We are motivated by the need for a generic object proposal generation
algorithm which achieves good balance between object detection recall, proposal
localization quality and computational efficiency. We propose a novel object
proposal algorithm, BING++, which inherits the virtue of good computational
efficiency of BING but significantly improves its proposal localization
quality. At high level we formulate the problem of object proposal generation
from a novel probabilistic perspective, based on which our BING++ manages to
improve the localization quality by employing edges and segments to estimate
object boundaries and update the proposals sequentially. We propose learning
the parameters efficiently by searching for approximate solutions in a
quantized parameter space for complexity reduction. We demonstrate the
generalization of BING++ with the same fixed parameters across different object
classes and datasets. Empirically our BING++ can run at half speed of BING on
CPU, but significantly improve the localization quality by 18.5% and 16.7% on
both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other
state-of-the-art approaches, BING++ can achieve comparable performance, but run
significantly faster.Comment: Accepted by TPAM
Sequential optimization for efficient high-quality object proposal generation
We are motivated by the need for a generic object proposal generation algorithm which achieves good balance between object detection recall, proposal localization quality and computational efficiency. We propose a novel object proposal algorithm, BING ++, which inherits the virtue of good computational efficiency of BING [1] but significantly improves its proposal localization quality. At high level we formulate the problem of object proposal generation from a novel probabilistic perspective, based on which our BING++ manages to improve the localization quality by employing edges and segments to estimate object boundaries and update the proposals sequentially. We propose learning the parameters efficiently by searching for approximate solutions in a quantized parameter space for complexity reduction. We demonstrate the generalization of BING++ with the same fixed parameters across different object classes and datasets. Empirically our BING++ can run at half speed of BING on CPU, but significantly improve the localization quality by 18.5 and 16.7 percent on both VOC2007 and Microhsoft COCO datasets, respectively. Compared with other state-of-the-art approaches, BING++ can achieve comparable performance, but run significantly faster