1,412 research outputs found
Classification of Occluded Objects using Fast Recurrent Processing
Recurrent neural networks are powerful tools for handling incomplete data
problems in computer vision, thanks to their significant generative
capabilities. However, the computational demand for these algorithms is too
high to work in real time, without specialized hardware or software solutions.
In this paper, we propose a framework for augmenting recurrent processing
capabilities into a feedforward network without sacrificing much from
computational efficiency. We assume a mixture model and generate samples of the
last hidden layer according to the class decisions of the output layer, modify
the hidden layer activity using the samples, and propagate to lower layers. For
visual occlusion problem, the iterative procedure emulates feedforward-feedback
loop, filling-in the missing hidden layer activity with meaningful
representations. The proposed algorithm is tested on a widely used dataset, and
shown to achieve 2 improvement in classification accuracy for occluded
objects. When compared to Restricted Boltzmann Machines, our algorithm shows
superior performance for occluded object classification.Comment: arXiv admin note: text overlap with arXiv:1409.8576 by other author
Multi-View 3D Object Detection Network for Autonomous Driving
This paper aims at high-accuracy 3D object detection in autonomous driving
scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework
that takes both LIDAR point cloud and RGB images as input and predicts oriented
3D bounding boxes. We encode the sparse 3D point cloud with a compact
multi-view representation. The network is composed of two subnetworks: one for
3D object proposal generation and another for multi-view feature fusion. The
proposal network generates 3D candidate boxes efficiently from the bird's eye
view representation of 3D point cloud. We design a deep fusion scheme to
combine region-wise features from multiple views and enable interactions
between intermediate layers of different paths. Experiments on the challenging
KITTI benchmark show that our approach outperforms the state-of-the-art by
around 25% and 30% AP on the tasks of 3D localization and 3D detection. In
addition, for 2D detection, our approach obtains 10.3% higher AP than the
state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Optimal Sensor Data Fusion Architecture for Object Detection in Adverse Weather Conditions
A good and robust sensor data fusion in diverse weather conditions is a quite
challenging task. There are several fusion architectures in the literature,
e.g. the sensor data can be fused right at the beginning (Early Fusion), or
they can be first processed separately and then concatenated later (Late
Fusion). In this work, different fusion architectures are compared and
evaluated by means of object detection tasks, in which the goal is to recognize
and localize predefined objects in a stream of data. Usually, state-of-the-art
object detectors based on neural networks are highly optimized for good weather
conditions, since the well-known benchmarks only consist of sensor data
recorded in optimal weather conditions. Therefore, the performance of these
approaches decreases enormously or even fails in adverse weather conditions. In
this work, different sensor fusion architectures are compared for good and
adverse weather conditions for finding the optimal fusion architecture for
diverse weather situations. A new training strategy is also introduced such
that the performance of the object detector is greatly enhanced in adverse
weather scenarios or if a sensor fails. Furthermore, the paper responds to the
question if the detection accuracy can be increased further by providing the
neural network with a-priori knowledge such as the spatial calibration of the
sensors.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
An evaluation of the pedestrian classification in a multi-domain multi-modality setup
The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion
Improving Pedestrian Recognition using Incremental Cross Modality Deep Learning
International audienceLate fusion schemes with deep learning classification patterns set up with multi-modality images have an essential role in pedestrian protection systems since they have achieved prominent results in the pedestrian recognition task. In this paper, the late fusion scheme merged with Convolutional Neural Networks (CNN) is investigated for pedestrian recognition based on the Daimler stereo vision data sets. An independent CNN-based classifier for each imaging modality (Intensity, Depth, and Optical Flow) is handled before the fusion of its probabilistic output scores with a Multi-Layer Perceptron which provides the recognition decision. In this paper, we set out to prove that the incremental cross-modality deep learning approach enhances pedestrian recognition performances. It also outperforms state-of-the-art pedestrian classifiers on the Daimler stereo-vision data sets
Ten Years of Pedestrian Detection, What Have We Learned?
Paper-by-paper results make it easy to miss the forest for the trees.We
analyse the remarkable progress of the last decade by discussing the main ideas
explored in the 40+ detectors currently present in the Caltech pedestrian
detection benchmark. We observe that there exist three families of approaches,
all currently reaching similar detection quality. Based on our analysis, we
study the complementarity of the most promising ideas by combining multiple
published strategies. This new decision forest detector achieves the current
best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding
- …