2,255 research outputs found

    Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

    Get PDF
    Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks are designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.Comment: 44 pages, 25 figure

    Evaluation of 3D-Video Compression for Automotive Stereo Vision Systems

    Get PDF
    This paper is an evaluation of the distribution of quantized coefficients resulting from the disparity of an automotive stereo vision system. The system captures the scene with two cameras, computes the disparity with Semi-Global Matching and encodes the left view and the disparity for transmission. Real world and synthetic video sequences were used to evaluate the coefficient distributions of the system under normal and challenging weather conditions. The results show, that the quantized disparity coefficients in frequency space have consistently lower entropy compared to the coefficients of the video scenes. Therefore, it is advantageous for the system to compress the disparity instead of one of the two video streams

    Accurate Optical Flow via Direct Cost Volume Processing

    Full text link
    We present an optical flow estimation approach that operates on the full four-dimensional cost volume. This direct approach shares the structural benefits of leading stereo matching pipelines, which are known to yield high accuracy. To this day, such approaches have been considered impractical due to the size of the cost volume. We show that the full four-dimensional cost volume can be constructed in a fraction of a second due to its regularity. We then exploit this regularity further by adapting semi-global matching to the four-dimensional setting. This yields a pipeline that achieves significantly higher accuracy than state-of-the-art optical flow methods while being faster than most. Our approach outperforms all published general-purpose optical flow methods on both Sintel and KITTI 2015 benchmarks.Comment: Published at the Conference on Computer Vision and Pattern Recognition (CVPR 2017

    An Outdoor Stereo Camera System for the Generation of Real-World Benchmark Datasets with Ground Truth

    Get PDF
    In this report we describe a high-performance stereo camera system to capture image sequences with high temporal and spatial resolution for the evaluation of various image processing tasks. The system was primarily designed for complex outdoor and traffic scenes which frequently occur in the automotive industry, but is also suited for other applications. For this task the system is equipped with a very accurate inertial measurement unit and global positioning system, which provides exact camera movement and position data. The system is already in active use and has produced several terabyte of challenging image sequences which are available for download

    TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture

    Get PDF
    Agricultural vehicles such as tractors and harvesters have for decades been able to navigate automatically and more efficiently using commercially available products such as auto-steering and tractor-guidance systems. However, a human operator is still required inside the vehicle to ensure the safety of vehicle and especially surroundings such as humans and animals. To get fully autonomous vehicles certified for farming, computer vision algorithms and sensor technologies must detect obstacles with equivalent or better than human-level performance. Furthermore, detections must run in real-time to allow vehicles to actuate and avoid collision.This thesis proposes a detection system (TractorEYE), a dataset (FieldSAFE), and procedures to fuse information from multiple sensor technologies to improve detection of obstacles and to generate a map. TractorEYE is a multi-sensor detection system for autonomous vehicles in agriculture. The multi-sensor system consists of three hardware synchronized and registered sensors (stereo camera, thermal camera and multi-beam lidar) mounted on/in a ruggedized and water-resistant casing. Algorithms have been developed to run a total of six detection algorithms (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is -- compared to a state-of-the-art object detector Faster R-CNN -- for an agricultural use-case able to detect humans better and at longer ranges (45-90m) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset includes synchronized recordings from a rgb camera, stereo camera, thermal camera, 360-degree camera, lidar and radar. Precise localization and pose is provided using IMU and GPS. Ground truth of static and moving obstacles (humans, mannequin dolls, barrels, buildings, vehicles, and vegetation) are available as an annotated orthophoto and GPS coordinates for moving obstacles. Detection information from multiple detection algorithms and sensors are fused into a map using Inverse Sensor Models and occupancy grid maps. This thesis presented many scientific contribution and state-of-the-art within perception for autonomous tractors; this includes a dataset, sensor platform, detection algorithms and procedures to perform multi-sensor fusion. Furthermore, important engineering contributions to autonomous farming vehicles are presented such as easily applicable, open-source software packages and algorithms that have been demonstrated in an end-to-end real-time detection system. The contributions of this thesis have demonstrated, addressed and solved critical issues to utilize camera-based perception systems that are essential to make autonomous vehicles in agriculture a reality

    Evaluation of 3D-Video Compression for Automotive Stereo Vision Systems

    Get PDF
    Abstract This paper is an evaluation of the distribution of quantized coefficients resulting from the disparity of an automotive stereo vision system. The system captures the scene with two cameras, computes the disparity with Semi-Global Matching and encodes the left view and the disparity for transmission. Real world and synthetic video sequences were used to evaluate the coefficient distributions of the system under normal and challenging weather conditions. The results show, that the quantized disparity coefficients in frequency space have consistently lower entropy compared to the coefficients of the video scenes. Therefore, it is advantageous for the system to compress the disparity instead of one of the two video streams. Introduction Motivation Driver assistance systems depend on a variety of sensors, interconnected with electronic control units (ECUs ). In order to get a settled and agreed interface definition, the International Organization for Standardization (ISO) initiated the standardization process of the video communication interface for cameras in road vehicles low-voltage differential signaling Figure 1 Stereo Vision System: The system captures the scene with two cameras, computes the disparity and encodes the left view and the disparity for transmission to the ECU. With the disparity information an automotive function can calculate the distance Z to an obstacle. We already examined the peak signal to noise ratio of the left video and the mean disparity error of the system under different rate allocations between the video and disparity streams (see Related Work In [5] the distribution of DCT-coefficients in the field of image compression is examined and an approximation of the AC-coefficients with Laplace distributions is proposed. The work o
    corecore