19 research outputs found
Spatiotemporal Stacked Sequential Learning for Pedestrian Detection
Pedestrian classifiers decide which image windows contain a pedestrian. In
practice, such classifiers provide a relatively high response at neighbor
windows overlapping a pedestrian, while the responses around potential false
positives are expected to be lower. An analogous reasoning applies for image
sequences. If there is a pedestrian located within a frame, the same pedestrian
is expected to appear close to the same location in neighbor frames. Therefore,
such a location has chances of receiving high classification scores during
several frames, while false positives are expected to be more spurious. In this
paper we propose to exploit such correlations for improving the accuracy of
base pedestrian classifiers. In particular, we propose to use two-stage
classifiers which not only rely on the image descriptors required by the base
classifiers but also on the response of such base classifiers in a given
spatiotemporal neighborhood. More specifically, we train pedestrian classifiers
using a stacked sequential learning (SSL) paradigm. We use a new pedestrian
dataset we have acquired from a car to evaluate our proposal at different frame
rates. We also test on a well known dataset: Caltech. The obtained results show
that our SSL proposal boosts detection accuracy significantly with a minimal
impact on the computational cost. Interestingly, SSL improves more the accuracy
at the most dangerous situations, i.e. when a pedestrian is close to the
camera.Comment: 8 pages, 5 figure, 1 tabl
What Makes a Place? Building Bespoke Place Dependent Object Detectors for Robotics
This paper is about enabling robots to improve their perceptual performance
through repeated use in their operating environment, creating local expert
detectors fitted to the places through which a robot moves. We leverage the
concept of 'experiences' in visual perception for robotics, accounting for bias
in the data a robot sees by fitting object detector models to a particular
place. The key question we seek to answer in this paper is simply: how do we
define a place? We build bespoke pedestrian detector models for autonomous
driving, highlighting the necessary trade off between generalisation and model
capacity as we vary the extent of the place we fit to. We demonstrate a
sizeable performance gain over a current state-of-the-art detector when using
computationally lightweight bespoke place-fitted detector models.Comment: IROS 201
Learning non-maximum suppression
Object detectors have hugely profited from moving towards an end-to-end
learning paradigm: proposals, features, and the classifier becoming one neural
network improved results two-fold on general object detection. One
indispensable component is non-maximum suppression (NMS), a post-processing
algorithm responsible for merging all detections that belong to the same
object. The de facto standard NMS algorithm is still fully hand-crafted,
suspiciously simple, and -- being based on greedy clustering with a fixed
distance threshold -- forces a trade-off between recall and precision. We
propose a new network architecture designed to perform NMS, using only boxes
and their score. We report experiments for person detection on PETS and for
general object categories on the COCO dataset. Our approach shows promise
providing improved localization and occlusion handling.Comment: Added "Supplementary material" titl
Ten Years of Pedestrian Detection, What Have We Learned?
Paper-by-paper results make it easy to miss the forest for the trees.We
analyse the remarkable progress of the last decade by discussing the main ideas
explored in the 40+ detectors currently present in the Caltech pedestrian
detection benchmark. We observe that there exist three families of approaches,
all currently reaching similar detection quality. Based on our analysis, we
study the complementarity of the most promising ideas by combining multiple
published strategies. This new decision forest detector achieves the current
best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding
Pedestrian Detection aided by Deep Learning Semantic Tasks
Deep learning methods have achieved great success in pedestrian detection,
owing to its ability to learn features from raw pixels. However, they mainly
capture middle-level representations, such as pose of pedestrian, but confuse
positive with hard negative samples, which have large ambiguity, e.g. the shape
and appearance of `tree trunk' or `wire pole' are similar to pedestrian in
certain viewpoint. This ambiguity can be distinguished by high-level
representation. To this end, this work jointly optimizes pedestrian detection
with semantic tasks, including pedestrian attributes (e.g. `carrying backpack')
and scene attributes (e.g. `road', `tree', and `horizontal'). Rather than
expensively annotating scene attributes, we transfer attributes information
from existing scene segmentation datasets to the pedestrian dataset, by
proposing a novel deep model to learn high-level features from multiple tasks
and multiple data sources. Since distinct tasks have distinct convergence rates
and data from different datasets have different distributions, a multi-task
objective function is carefully designed to coordinate tasks and reduce
discrepancies among datasets. The importance coefficients of tasks and network
parameters in this objective function can be iteratively estimated. Extensive
evaluations show that the proposed approach outperforms the state-of-the-art on
the challenging Caltech and ETH datasets, where it reduces the miss rates of
previous deep models by 17 and 5.5 percent, respectively