8,347 research outputs found
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
- …