15 research outputs found
Repulsion Loss: Detecting Pedestrians in a Crowd
Detecting individual pedestrians in a crowd remains a challenging problem
since the pedestrians often gather together and occlude each other in
real-world scenarios. In this paper, we first explore how a state-of-the-art
pedestrian detector is harmed by crowd occlusion via experimentation, providing
insights into the crowd occlusion problem. Then, we propose a novel bounding
box regression loss specifically designed for crowd scenes, termed repulsion
loss. This loss is driven by two motivations: the attraction by target, and the
repulsion by other surrounding objects. The repulsion term prevents the
proposal from shifting to surrounding objects thus leading to more crowd-robust
localization. Our detector trained by repulsion loss outperforms all the
state-of-the-art methods with a significant improvement in occlusion cases.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Taking a Deeper Look at Pedestrians
In this paper we study the use of convolutional neural networks (convnets)
for the task of pedestrian detection. Despite their recent diverse successes,
convnets historically underperform compared to other pedestrian detectors. We
deliberately omit explicitly modelling the problem into the network (e.g. parts
or occlusion modelling) and show that we can reach competitive performance
without bells and whistles. In a wide range of experiments we analyse small and
big convnets, their architectural choices, parameters, and the influence of
different training data, including pre-training on surrogate tasks.
We present the best convnet detectors on the Caltech and KITTI dataset. On
Caltech our convnets reach top performance both for the Caltech1x and
Caltech10x training setup. Using additional data at training time our strongest
convnet model is competitive even to detectors that use additional data
(optical flow) at test time
Aggregated Channels Network for Real-Time Pedestrian Detection
Convolutional neural networks (CNNs) have demonstrated their superiority in
numerous computer vision tasks, yet their computational cost results
prohibitive for many real-time applications such as pedestrian detection which
is usually performed on low-consumption hardware. In order to alleviate this
drawback, most strategies focus on using a two-stage cascade approach.
Essentially, in the first stage a fast method generates a significant but
reduced amount of high quality proposals that later, in the second stage, are
evaluated by the CNN. In this work, we propose a novel detection pipeline that
further benefits from the two-stage cascade strategy. More concretely, the
enriched and subsequently compressed features used in the first stage are
reused as the CNN input. As a consequence, a simpler network architecture,
adapted for such small input sizes, allows to achieve real-time performance and
obtain results close to the state-of-the-art while running significantly faster
without the use of GPU. In particular, considering that the proposed pipeline
runs in frame rate, the achieved performance is highly competitive. We
furthermore demonstrate that the proposed pipeline on itself can serve as an
effective proposal generator
Ten Years of Pedestrian Detection, What Have We Learned?
Paper-by-paper results make it easy to miss the forest for the trees.We
analyse the remarkable progress of the last decade by discussing the main ideas
explored in the 40+ detectors currently present in the Caltech pedestrian
detection benchmark. We observe that there exist three families of approaches,
all currently reaching similar detection quality. Based on our analysis, we
study the complementarity of the most promising ideas by combining multiple
published strategies. This new decision forest detector achieves the current
best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding
Optimal use of existing freeway management surveillance infrastructure on pedestrian bridges with computer vision techniques
South Africa as a developing country has to make the most out of the infrastructure that are available. Given the high level of crash involving pedestrians, it is critical that all means available are utilised to characterise pedestrian movements on the highway and pedestrian bridges. This paper will focus on using the existing camera infrastructure, but will extend its use to automatically detect and count pedestrians that use the pedestrian bridges. The pedestrian movement data can be used to aid with the evaluation of pedestrian safety campaigns, or to recognise trends in pedestrian movement. The paper presents the impact of various parameter changes to the state of the art technique used, as well as orientation suggestions for future installations. This is done to make optimal use of existing infrastructure, and provides an alternative to existing high-end systems. The methodology includes training a computer vision-based algorithm to recognise and count pedestrians for specific scenes, for example pedestrian bridges. The paper evaluates different suppression techniques to reduce false positives. The results show that 72% of pedestrians can be detected (a hit rate of 72%), with the camera facing a pedestrian bridge squarely from the side, so that silhouettes are clearly visible. High end products not using existing infrastructure typically have a hit rate of 70%-90%. The solution in this paper competes with high-end products, and can be expanded for infrastructure security applications, e.g. monitoring copper cables or monitoring of high risk areas.Paper presented at the 35th Annual Southern African Transport Conference 4-7 July 2016 "Transport ? a catalyst for socio-economic
growth and development opportunities to improve quality of life", CSIR International Convention Centre, Pretoria, South Africa.The Minister of Transport, South AfricaTransportation Research Board of the US
TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture
Agricultural vehicles such as tractors and harvesters have for decades been able to navigate automatically and more efficiently using commercially available products such as auto-steering and tractor-guidance systems. However, a human operator is still required inside the vehicle to ensure the safety of vehicle and especially surroundings such as humans and animals. To get fully autonomous vehicles certified for farming, computer vision algorithms and sensor technologies must detect obstacles with equivalent or better than human-level performance. Furthermore, detections must run in real-time to allow vehicles to actuate and avoid collision.This thesis proposes a detection system (TractorEYE), a dataset (FieldSAFE), and procedures to fuse information from multiple sensor technologies to improve detection of obstacles and to generate a map. TractorEYE is a multi-sensor detection system for autonomous vehicles in agriculture. The multi-sensor system consists of three hardware synchronized and registered sensors (stereo camera, thermal camera and multi-beam lidar) mounted on/in a ruggedized and water-resistant casing. Algorithms have been developed to run a total of six detection algorithms (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is -- compared to a state-of-the-art object detector Faster R-CNN -- for an agricultural use-case able to detect humans better and at longer ranges (45-90m) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset includes synchronized recordings from a rgb camera, stereo camera, thermal camera, 360-degree camera, lidar and radar. Precise localization and pose is provided using IMU and GPS. Ground truth of static and moving obstacles (humans, mannequin dolls, barrels, buildings, vehicles, and vegetation) are available as an annotated orthophoto and GPS coordinates for moving obstacles. Detection information from multiple detection algorithms and sensors are fused into a map using Inverse Sensor Models and occupancy grid maps. This thesis presented many scientific contribution and state-of-the-art within perception for autonomous tractors; this includes a dataset, sensor platform, detection algorithms and procedures to perform multi-sensor fusion. Furthermore, important engineering contributions to autonomous farming vehicles are presented such as easily applicable, open-source software packages and algorithms that have been demonstrated in an end-to-end real-time detection system. The contributions of this thesis have demonstrated, addressed and solved critical issues to utilize camera-based perception systems that are essential to make autonomous vehicles in agriculture a reality