214 research outputs found
Crosstalk Cascades for Frame-rate Pedestrian Detection
Cascades help make sliding window object detection fast,
nevertheless, computational demands remain prohibitive for numerous applications. Currently, evaluation of adjacent windows proceeds independently; this is suboptimal as detector responses at nearby locations and scales are correlated. We propose to exploit these correlations by
tightly coupling detector evaluation of nearby windows. We introduce two opposing mechanisms: detector excitation of promising neighbors and inhibition of inferior neighbors. By enabling neighboring detectors to communicate, crosstalk cascades achieve major gains (4-30x speedup) over cascades evaluated independently at each image location. Combined
with recent advances in fast multi-scale feature computation, for which we provide an optimized implementation, our approach runs at 35-65 fps
on 640 x 480 images while attaining state-of-the-art accuracy
Recommended from our members
Towards Universal Object Detection
Object detection is one of the most important and challenging research topics in computer vision. It is playing an important role in our everyday life and has many applications, e.g. surveillance, autonomous driving, robotics, drone, medical imaging, etc. The ultimate goal of object detection is a universal object detector that can work very well in any case under any condition like human vision system. However, there are multiple challenges on the universality of object detection, e.g. scale-variance, high-quality requirement, domain shift, computational constraint, etc. These will prevent the object detector from being widely used for various scales of objects, critical applications requiring extremely accurate localization, scenarios with changing domain priors, and diverse hardware settings. To address these challenges, multiple solutions have been proposed in this thesis. These include an efficient multi-scale architecture to achieve scale-invariant detection, a robust multi-stage framework effective for high-quality requirement, a cross-domain solution to extend the universality over various domains, and a design of complexity-aware cascades and a novel low-precision network to enhance the universality under different computational constraints. All these efforts have substantially improved the universality of object detection, and the advanced object detector can be applied to broader environments
Spatiotemporal Stacked Sequential Learning for Pedestrian Detection
Pedestrian classifiers decide which image windows contain a pedestrian. In
practice, such classifiers provide a relatively high response at neighbor
windows overlapping a pedestrian, while the responses around potential false
positives are expected to be lower. An analogous reasoning applies for image
sequences. If there is a pedestrian located within a frame, the same pedestrian
is expected to appear close to the same location in neighbor frames. Therefore,
such a location has chances of receiving high classification scores during
several frames, while false positives are expected to be more spurious. In this
paper we propose to exploit such correlations for improving the accuracy of
base pedestrian classifiers. In particular, we propose to use two-stage
classifiers which not only rely on the image descriptors required by the base
classifiers but also on the response of such base classifiers in a given
spatiotemporal neighborhood. More specifically, we train pedestrian classifiers
using a stacked sequential learning (SSL) paradigm. We use a new pedestrian
dataset we have acquired from a car to evaluate our proposal at different frame
rates. We also test on a well known dataset: Caltech. The obtained results show
that our SSL proposal boosts detection accuracy significantly with a minimal
impact on the computational cost. Interestingly, SSL improves more the accuracy
at the most dangerous situations, i.e. when a pedestrian is close to the
camera.Comment: 8 pages, 5 figure, 1 tabl
Perceptual Generative Adversarial Networks for Small Object Detection
Detecting small objects is notoriously challenging due to their low
resolution and noisy representation. Existing object detection pipelines
usually detect small objects through learning representations of all the
objects at multiple scales. However, the performance gain of such ad hoc
architectures is usually limited to pay off the computational cost. In this
work, we address the small object detection problem by developing a single
architecture that internally lifts representations of small objects to
"super-resolved" ones, achieving similar characteristics as large objects and
thus more discriminative for detection. For this purpose, we propose a new
Perceptual Generative Adversarial Network (Perceptual GAN) model that improves
small object detection through narrowing representation difference of small
objects from the large ones. Specifically, its generator learns to transfer
perceived poor representations of the small objects to super-resolved ones that
are similar enough to real large objects to fool a competing discriminator.
Meanwhile its discriminator competes with the generator to identify the
generated representation and imposes an additional perceptual requirement -
generated representations of small objects must be beneficial for detection
purpose - on the generator. Extensive evaluations on the challenging
Tsinghua-Tencent 100K and the Caltech benchmark well demonstrate the
superiority of Perceptual GAN in detecting small objects, including traffic
signs and pedestrians, over well-established state-of-the-arts
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes
The success of deep learning in computer vision is based on availability of
large annotated datasets. To lower the need for hand labeled images, virtually
rendered 3D worlds have recently gained popularity. Creating realistic 3D
content is challenging on its own and requires significant human effort. In
this work, we propose an alternative paradigm which combines real and synthetic
data for learning semantic instance segmentation and object detection models.
Exploiting the fact that not all aspects of the scene are equally important for
this task, we propose to augment real-world imagery with virtual objects of the
target category. Capturing real-world images at large scale is easy and cheap,
and directly provides real background appearances without the need for creating
complex 3D models of the environment. We present an efficient procedure to
augment real images with virtual objects. This allows us to create realistic
composite images which exhibit both realistic background appearance and a large
number of complex object arrangements. In contrast to modeling complete 3D
environments, our augmentation approach requires only a few user interactions
in combination with 3D shapes of the target object. Through extensive
experimentation, we conclude the right set of parameters to produce augmented
data which can maximally enhance the performance of instance segmentation
models. Further, we demonstrate the utility of our approach on training
standard deep models for semantic instance segmentation and object detection of
cars in outdoor driving scenes. We test the models trained on our augmented
data on the KITTI 2015 dataset, which we have annotated with pixel-accurate
ground truth, and on Cityscapes dataset. Our experiments demonstrate that
models trained on augmented imagery generalize better than those trained on
synthetic data or models trained on limited amount of annotated real data
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …