2,109 research outputs found
Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features
We propose a simple yet effective approach to the problem of pedestrian
detection which outperforms the current state-of-the-art. Our new features are
built on the basis of low-level visual features and spatial pooling.
Incorporating spatial pooling improves the translational invariance and thus
the robustness of the detection process. We then directly optimise the partial
area under the ROC curve (\pAUC) measure, which concentrates detection
performance in the range of most practical importance. The combination of these
factors leads to a pedestrian detector which outperforms all competitors on all
of the standard benchmark datasets. We advance state-of-the-art results by
lowering the average miss rate from to on the INRIA benchmark,
to on the ETH benchmark, to on the TUD-Brussels
benchmark and to on the Caltech-USA benchmark.Comment: 16 pages. Appearing in Proc. European Conf. Computer Vision (ECCV)
201
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Unsupervised Network Pretraining via Encoding Human Design
Over the years, computer vision researchers have spent an immense amount of
effort on designing image features for the visual object recognition task. We
propose to incorporate this valuable experience to guide the task of training
deep neural networks. Our idea is to pretrain the network through the task of
replicating the process of hand-designed feature extraction. By learning to
replicate the process, the neural network integrates previous research
knowledge and learns to model visual objects in a way similar to the
hand-designed features. In the succeeding finetuning step, it further learns
object-specific representations from labeled data and this boosts its
classification power. We pretrain two convolutional neural networks where one
replicates the process of histogram of oriented gradients feature extraction,
and the other replicates the process of region covariance feature extraction.
After finetuning, we achieve substantially better performance than the baseline
methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of
Computer Visio
Is the Pedestrian going to Cross? Answering by 2D Pose Estimation
Our recent work suggests that, thanks to nowadays powerful CNNs, image-based
2D pose estimation is a promising cue for determining pedestrian intentions
such as crossing the road in the path of the ego-vehicle, stopping before
entering the road, and starting to walk or bending towards the road. This
statement is based on the results obtained on non-naturalistic sequences
(Daimler dataset), i.e. in sequences choreographed specifically for performing
the study. Fortunately, a new publicly available dataset (JAAD) has appeared
recently to allow developing methods for detecting pedestrian intentions in
naturalistic driving conditions; more specifically, for addressing the relevant
question is the pedestrian going to cross? Accordingly, in this paper we use
JAAD to assess the usefulness of 2D pose estimation for answering such a
question. We combine CNN-based pedestrian detection, tracking and pose
estimation to predict the crossing action from monocular images. Overall, the
proposed pipeline provides new state-of-the-art results.Comment: This is a paper presented in IEEE Intelligent Vehicles Symposium
(IEEE IV 2018
- …