1,412 research outputs found

    Classification of Occluded Objects using Fast Recurrent Processing

    Full text link
    Recurrent neural networks are powerful tools for handling incomplete data problems in computer vision, thanks to their significant generative capabilities. However, the computational demand for these algorithms is too high to work in real time, without specialized hardware or software solutions. In this paper, we propose a framework for augmenting recurrent processing capabilities into a feedforward network without sacrificing much from computational efficiency. We assume a mixture model and generate samples of the last hidden layer according to the class decisions of the output layer, modify the hidden layer activity using the samples, and propagate to lower layers. For visual occlusion problem, the iterative procedure emulates feedforward-feedback loop, filling-in the missing hidden layer activity with meaningful representations. The proposed algorithm is tested on a widely used dataset, and shown to achieve 2Ă—\times improvement in classification accuracy for occluded objects. When compared to Restricted Boltzmann Machines, our algorithm shows superior performance for occluded object classification.Comment: arXiv admin note: text overlap with arXiv:1409.8576 by other author

    Multi-View 3D Object Detection Network for Autonomous Driving

    Full text link
    This paper aims at high-accuracy 3D object detection in autonomous driving scenario. We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes. We encode the sparse 3D point cloud with a compact multi-view representation. The network is composed of two subnetworks: one for 3D object proposal generation and another for multi-view feature fusion. The proposal network generates 3D candidate boxes efficiently from the bird's eye view representation of 3D point cloud. We design a deep fusion scheme to combine region-wise features from multiple views and enable interactions between intermediate layers of different paths. Experiments on the challenging KITTI benchmark show that our approach outperforms the state-of-the-art by around 25% and 30% AP on the tasks of 3D localization and 3D detection. In addition, for 2D detection, our approach obtains 10.3% higher AP than the state-of-the-art on the hard data among the LIDAR-based methods.Comment: To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Optimal Sensor Data Fusion Architecture for Object Detection in Adverse Weather Conditions

    Full text link
    A good and robust sensor data fusion in diverse weather conditions is a quite challenging task. There are several fusion architectures in the literature, e.g. the sensor data can be fused right at the beginning (Early Fusion), or they can be first processed separately and then concatenated later (Late Fusion). In this work, different fusion architectures are compared and evaluated by means of object detection tasks, in which the goal is to recognize and localize predefined objects in a stream of data. Usually, state-of-the-art object detectors based on neural networks are highly optimized for good weather conditions, since the well-known benchmarks only consist of sensor data recorded in optimal weather conditions. Therefore, the performance of these approaches decreases enormously or even fails in adverse weather conditions. In this work, different sensor fusion architectures are compared for good and adverse weather conditions for finding the optimal fusion architecture for diverse weather situations. A new training strategy is also introduced such that the performance of the object detector is greatly enhanced in adverse weather scenarios or if a sensor fails. Furthermore, the paper responds to the question if the detection accuracy can be increased further by providing the neural network with a-priori knowledge such as the spatial calibration of the sensors.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    An evaluation of the pedestrian classification in a multi-domain multi-modality setup

    Get PDF
    The objective of this article is to study the problem of pedestrian classification across different light spectrum domains (visible and far-infrared (FIR)) and modalities (intensity, depth and motion). In recent years, there has been a number of approaches for classifying and detecting pedestrians in both FIR and visible images, but the methods are difficult to compare, because either the datasets are not publicly available or they do not offer a comparison between the two domains. Our two primary contributions are the following: (1) we propose a public dataset, named RIFIR , containing both FIR and visible images collected in an urban environment from a moving vehicle during daytime; and (2) we compare the state-of-the-art features in a multi-modality setup: intensity, depth and flow, in far-infrared over visible domains. The experiments show that features families, intensity self-similarity (ISS), local binary patterns (LBP), local gradient patterns (LGP) and histogram of oriented gradients (HOG), computed from FIR and visible domains are highly complementary, but their relative performance varies across different modalities. In our experiments, the FIR domain has proven superior to the visible one for the task of pedestrian classification, but the overall best results are obtained by a multi-domain multi-modality multi-feature fusion

    Improving Pedestrian Recognition using Incremental Cross Modality Deep Learning

    Get PDF
    International audienceLate fusion schemes with deep learning classification patterns set up with multi-modality images have an essential role in pedestrian protection systems since they have achieved prominent results in the pedestrian recognition task. In this paper, the late fusion scheme merged with Convolutional Neural Networks (CNN) is investigated for pedestrian recognition based on the Daimler stereo vision data sets. An independent CNN-based classifier for each imaging modality (Intensity, Depth, and Optical Flow) is handled before the fusion of its probabilistic output scores with a Multi-Layer Perceptron which provides the recognition decision. In this paper, we set out to prove that the incremental cross-modality deep learning approach enhances pedestrian recognition performances. It also outperforms state-of-the-art pedestrian classifiers on the Daimler stereo-vision data sets

    Ten Years of Pedestrian Detection, What Have We Learned?

    Full text link
    Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by discussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detection quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding
    • …
    corecore