193 research outputs found
Near-field Perception for Low-Speed Vehicle Automation using Surround-view Fisheye Cameras
Cameras are the primary sensor in automated driving systems. They provide
high information density and are optimal for detecting road infrastructure cues
laid out for human vision. Surround-view camera systems typically comprise of
four fisheye cameras with 190{\deg}+ field of view covering the entire
360{\deg} around the vehicle focused on near-field sensing. They are the
principal sensors for low-speed, high accuracy, and close-range sensing
applications, such as automated parking, traffic jam assistance, and low-speed
emergency braking. In this work, we provide a detailed survey of such vision
systems, setting up the survey in the context of an architecture that can be
decomposed into four modular components namely Recognition, Reconstruction,
Relocalization, and Reorganization. We jointly call this the 4R Architecture.
We discuss how each component accomplishes a specific aspect and provide a
positional argument that they can be synergized to form a complete perception
system for low-speed automation. We support this argument by presenting results
from previous works and by presenting architecture proposals for such a system.
Qualitative results are presented in the video at https://youtu.be/ae8bCOF77uY.Comment: Accepted for publication at IEEE Transactions on Intelligent
Transportation System
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
Multi-task near-field perception for autonomous driving using surround-view fisheye cameras
Die Bildung der Augen fĂŒhrte zum Urknall der Evolution. Die Dynamik Ă€nderte sich von einem primitiven Organismus, der auf den Kontakt mit der Nahrung wartete, zu einem Organismus, der durch visuelle Sensoren gesucht wurde. Das menschliche Auge ist eine der raffiniertesten Entwicklungen der Evolution, aber es hat immer noch MĂ€ngel. Der Mensch hat ĂŒber Millionen von Jahren einen biologischen Wahrnehmungsalgorithmus entwickelt, der in der Lage ist, Autos zu fahren, Maschinen zu bedienen, Flugzeuge zu steuern und Schiffe zu navigieren. Die Automatisierung dieser FĂ€higkeiten fĂŒr Computer ist entscheidend fĂŒr verschiedene Anwendungen, darunter selbstfahrende Autos, Augmented RealitĂ€t und architektonische Vermessung. Die visuelle Nahfeldwahrnehmung im Kontext von selbstfahrenden Autos kann die Umgebung in einem Bereich von 0 - 10 Metern und 360° Abdeckung um das Fahrzeug herum wahrnehmen. Sie ist eine entscheidende Entscheidungskomponente bei der Entwicklung eines sichereren automatisierten Fahrens. JĂŒngste Fortschritte im Bereich Computer Vision und Deep Learning in Verbindung mit hochwertigen Sensoren wie Kameras und LiDARs haben ausgereifte Lösungen fĂŒr die visuelle Wahrnehmung hervorgebracht. Bisher stand die Fernfeldwahrnehmung im Vordergrund. Ein weiteres wichtiges Problem ist die begrenzte Rechenleistung, die fĂŒr die Entwicklung von Echtzeit-Anwendungen zur VerfĂŒgung steht. Aufgrund dieses Engpasses kommt es hĂ€ufig zu einem Kompromiss zwischen Leistung und Laufzeiteffizienz. Wir konzentrieren uns auf die folgenden Themen, um diese anzugehen: 1) Entwicklung von Nahfeld-Wahrnehmungsalgorithmen mit hoher Leistung und geringer RechenkomplexitĂ€t fĂŒr verschiedene visuelle Wahrnehmungsaufgaben wie geometrische und semantische Aufgaben unter Verwendung von faltbaren neuronalen Netzen. 2) Verwendung von Multi-Task-Learning zur Ăberwindung von RechenengpĂ€ssen durch die gemeinsame Nutzung von initialen Faltungsschichten zwischen den Aufgaben und die Entwicklung von Optimierungsstrategien, die die Aufgaben ausbalancieren.The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of 0 - 10 meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for developing real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Developing near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and developing optimization strategies that balance tasks
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
Electric Vehicles are increasingly common, with inductive chargepads being
considered a convenient and efficient means of charging electric vehicles.
However, drivers are typically poor at aligning the vehicle to the necessary
accuracy for efficient inductive charging, making the automated alignment of
the two charging plates desirable. In parallel to the electrification of the
vehicular fleet, automated parking systems that make use of surround-view
camera systems are becoming increasingly popular. In this work, we propose a
system based on the surround-view camera architecture to detect, localize, and
automatically align the vehicle with the inductive chargepad. The visual design
of the chargepads is not standardized and not necessarily known beforehand.
Therefore, a system that relies on offline training will fail in some
situations. Thus, we propose a self-supervised online learning method that
leverages the driver's actions when manually aligning the vehicle with the
chargepad and combine it with weak supervision from semantic segmentation and
depth to learn a classifier to auto-annotate the chargepad in the video for
further training. In this way, when faced with a previously unseen chargepad,
the driver needs only manually align the vehicle a single time. As the
chargepad is flat on the ground, it is not easy to detect it from a distance.
Thus, we propose using a Visual SLAM pipeline to learn landmarks relative to
the chargepad to enable alignment from a greater range. We demonstrate the
working system on an automated vehicle as illustrated in the video at
https://youtu.be/_cLCmkW4UYo. To encourage further research, we will share a
chargepad dataset used in this work.Comment: Accepted for publication at IEEE Transactions on Intelligent
Transportation System
Recommended from our members
Automotive top-view image generation using orthogonally diverging fisheye cameras
Advanced Driver Assistance Systems in vehicles can be a great assistance to drivers by providing them a quick and easy way to visualize their entire 360-degree surroundings. We introduce a new camera set-up for a surround-view imaging system that may be part of an ADAS. This set-up involves four wide-angle fisheye cameras with orthogonally diverging camera axes, which allows for capturing the entire 360 degrees around a vehicle in four images, captured from the lateral, front, and rear views. Simple perspective transforms can be used to convert these images into a synthesized top-view image, which displays the scene as viewed from above the vehicle. These transforms, however, are typically derived using a basic calibration procedure that is only capable of correctly mapping ground-plane points in captured images to their corresponding locations in the top-view image, and subsequently, all off-the-ground points look distorted. We present a new method for calibrating a top-view image, in which objects and off-the-ground points are accurately represented. We also present a method for using specifically designed disparity search bands to segment the scene in the overlapping field-of-view (FOV) regions between adjacent cameras, each pair of which is effectively a stereo imaging system. Such wide-baseline stereo systems with orthogonally diverging camera axes make stereo matching difficult, and traditional correspondence algorithms cannot reliably generate the dense disparity maps that might be computed in a parallel stereo set-up involving cameras that follow a rectilinear model. We segment the scene into the ground plane, objects of interest, and the background, and show that our new virtual camera calibration parameters can be applied to represent objects in the scene in a more realistic manner.Electrical and Computer Engineerin
UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models
In classical computer vision, rectification is an integral part of multi-view
depth estimation. It typically includes epipolar rectification and lens
distortion correction. This process simplifies the depth estimation
significantly, and thus it has been adopted in CNN approaches. However,
rectification has several side effects, including a reduced field of view
(FOV), resampling distortion, and sensitivity to calibration errors. The
effects are particularly pronounced in case of significant distortion (e.g.,
wide-angle fisheye cameras). In this paper, we propose a generic scale-aware
self-supervised pipeline for estimating depth, euclidean distance, and visual
odometry from unrectified monocular videos. We demonstrate a similar level of
precision on the unrectified KITTI dataset with barrel distortion comparable to
the rectified KITTI dataset. The intuition being that the rectification step
can be implicitly absorbed within the CNN model, which learns the distortion
model without increasing complexity. Our approach does not suffer from a
reduced field of view and avoids computational costs for rectification at
inference time. To further illustrate the general applicability of the proposed
framework, we apply it to wide-angle fisheye cameras with 190
horizontal field of view. The training framework UnRectDepthNet takes in the
camera distortion model as an argument and adapts projection and unprojection
functions accordingly. The proposed algorithm is evaluated further on the KITTI
rectified dataset, and we achieve state-of-the-art results that improve upon
our previous work FisheyeDistanceNet. Qualitative results on a distorted test
scene video sequence indicate excellent performance
https://youtu.be/K6pbx3bU4Ss.Comment: Minor fixes added after IROS 2020 Camera ready submission. IROS 2020
presentation video - https://www.youtube.com/watch?v=3Br2KSWZRr
Recommended from our members
Perceptual monocular depth estimation
Monocular depth estimation (MDE), which is the task of using a single image to predict scene depths, has gained considerable interest, in large part owing to the popularity of applying deep learning methods to solve âcomputer vision problemsâ. Monocular cues provide sufficient data for humans to instantaneously extract an understanding of scene geometries and relative depths, which is evidence of both the processing power of the human visual system and the predictive power of the monocular data. However, developing computational models to predict depth from monocular images remains challenging. Hand-designed MDE features do not perform particularly well, and even current âdeepâ models are still evolving. Here we propose a novel approach that uses perceptually-relevant natural scene statistics (NSS) features to predict depths from monocular images in a simple, scale-agnostic way that is competitive with state-of-the-art systems. While the statistics of natural photographic images have been successfully used in a variety of image and video processing, analysis, and quality assessment tasks, they have never been applied in a predictive end-to-end deep-learning model for monocular depth. Here we accomplish this by developing a new closed-form bivariate model of image luminances and use features extracted from this model and from other NSS models to drive a novel deep learning framework for predicting depth given a single image. We then extend our perceptually-based MDE model to fisheye images, which suffer from severe spatial distortions, and we show that our method that uses monocular cues performs comparably to our best fisheye stereo matching approach. Fisheye cameras have become increasingly popular in automotive applications, because they provide a wider (approximately 180 degrees) field-of-view (FoV), thereby giving drivers and driver assistance systems more visibility with minimal hardware. We explore fisheye stereo as it pertains to the problem of automotive surround-view (SV), specifically, which is a system comprising four fisheye cameras positioned on the front, right, rear, and left sides of a vehicle. The SV system perspectively transforms the images captured by these four cameras and stitches them together in a birdseye-view representation of the scene centered around the ego vehicle to display to the driver. With the camera axes oriented orthogonally away from each other and with each camera capturing approximately 180 degrees laterally, there exists an overlap in FoVs between adjacent cameras. It is within these regions where we have stereo vision, and can thus triangulate depths with an appropriate correspondence matching method. Each stereo system within the SV configuration has a wide baseline and two orthogonally-divergent camera axes, both of which make traditional methods for estimating stereo correspondences perform poorly. Our stereo pipeline, which relies on a neural network trained for predicting stereo correspondences, performs well even when the stereo system has limited overlap in FoVs and two dissimilar views. Our monocular approach, however, can be applied to entire fisheye images and does not rely on the underlying geometry of the stereo configuration. We compare these two depth-prediction methods in both performance and application. To explore stereo correspondence matching using fisheye images and MDE in non-fisheye images, we also generated a large-scale photorealistic synthetic database containing co-registered RGB images and depth maps using a simulated SV camera configuration. The database was first captured using fisheye cameras with known intrinsic parameters, and the fisheye distortions were then removed to create the non-fisheye portion of the database. We detail the process of creating the synthetic-but-realistic city scene in which we captured the images and depth maps along with the methodology for generating such a large, varied, and generalizable datasetElectrical and Computer Engineerin
- âŠ