1,066 research outputs found
Application and evaluation of direct sparse visual odometry in marine vessels
With the international community pushing for a computer vision based option to the laws requiring a look-out for marine vehicles, there is now a significant motivation to provide digital solutions for navigation using these envisioned mandatory visual sensors. This paper explores the monocular direct sparse odometry algorithm when applied to a typical marine environment. The method uses a single camera to estimate a vessel\u27s motion and position over time and is then compared to ground truth to establish feasibility as both a local and global navigation system. Whilst it was inconsistent in accurately estimating vessel position, it was found that it could consistently estimate the vessel\u27s orientation in the majority of the situations the vessel was tasked with. It is therefore shown that monocular direct sparse odometry is partially suitable as a standalone navigation system and is a strong base for a multi-sensor solution
OmniDet: Surround View Cameras based Multi-task Visual Perception Network for Autonomous Driving
Surround View fisheye cameras are commonly deployed in automated driving for
360\deg{} near-field sensing around the vehicle. This work presents a
multi-task visual perception network on unrectified fisheye images to enable
the vehicle to sense its surrounding environment. It consists of six primary
tasks necessary for an autonomous driving system: depth estimation, visual
odometry, semantic segmentation, motion segmentation, object detection, and
lens soiling detection. We demonstrate that the jointly trained model performs
better than the respective single task versions. Our multi-task model has a
shared encoder providing a significant computational advantage and has
synergized decoders where tasks support each other. We propose a novel camera
geometry based adaptation mechanism to encode the fisheye distortion model both
at training and inference. This was crucial to enable training on the WoodScape
dataset, comprised of data from different parts of the world collected by 12
different cameras mounted on three different cars with different intrinsics and
viewpoints. Given that bounding boxes is not a good representation for
distorted fisheye images, we also extend object detection to use a polygon with
non-uniformly sampled vertices. We additionally evaluate our model on standard
automotive datasets, namely KITTI and Cityscapes. We obtain the
state-of-the-art results on KITTI for depth estimation and pose estimation
tasks and competitive performance on the other tasks. We perform extensive
ablation studies on various architecture choices and task weighting
methodologies. A short video at https://youtu.be/xbSjZ5OfPes provides
qualitative results.Comment: Camera ready version accepted for RA-L and ICRA 2021 publicatio
Vehicle Distance Detection Using Monocular Vision and Machine Learning
With the development of new cutting-edge technology, autonomous vehicles (AVs) have become the main topic in the majority of the automotive industries. For an AV to be safely used on the public roads it needs to be able to perceive its surrounding environment and calculate decisions within real-time. A perfect AV still does not exist for the majority of public use, but advanced driver assistance systems (ADAS) have been already integrated into everyday vehicles. It is predicted that these systems will evolve to work together to become a fully AV of the future. This thesis’ main focus is the combination of ADAS with artificial intelligence (AI) models. Since neural networks (NNs) could be unpredictable at many occasions, the main aspect of this thesis is the research of which neural network architecture will be most accurate in perceiving distance between vehicles. Hence, the study of integration of ADAS with AI, and studying whether AI can safely be used as a central processor for AV needs resolution. The created ADAS in this thesis mainly focuses on using monocular vision and machine training. A dataset of 200,000 images was used to train a neural network (NN) model, which accurately detect whether an image is a license plate or not by 96.75% accuracy. A sliding window reads whether a sub-section of an image is a license plate; the process achieved if it is, and the algorithm stores that sub-section image. The sub-images are run through a heatmap threshold to help minimize false detections. Upon detecting the license plate, the final algorithm determines the distance of the vehicle of the license plate detected. It then calculates the distance and outputs the data to the user. This process achieves results with up to a 1-meter distance accuracy. This ADAS has been aimed to be useable by the public, and easily integrated into future AV systems
A Comparison of Monocular Visual SLAM and Visual Odometry Methods Applied to 3D Reconstruction
This work was supported by the SDAS Research Group (www.sdas-group.com accessed
on 16 June 2023).Pure monocular 3D reconstruction is a complex problem that has attracted the research community's interest due to the affordability and availability of RGB sensors. SLAM, VO, and SFM are disciplines formulated to solve the 3D reconstruction problem and estimate the camera's ego-motion; so, many methods have been proposed. However, most of these methods have not been evaluated on large datasets and under various motion patterns, have not been tested under the same metrics, and most of them have not been evaluated following a taxonomy, making their comparison and selection difficult. In this research, we performed a comparison of ten publicly available SLAM and VO methods following a taxonomy, including one method for each category of the primary taxonomy, three machine-learning-based methods, and two updates of the best methods to identify the advantages and limitations of each category of the taxonomy and test whether the addition of machine learning or updates on those methods improved them significantly. Thus, we evaluated each algorithm using the TUM-Mono dataset and benchmark, and we performed an inferential statistical analysis to identify the significant differences through its metrics. The results determined that the sparse-direct methods significantly outperformed the rest of the taxonomy, and fusing them with machine learning techniques significantly enhanced the geometric-based methods' performance from different perspectives.SDAS Research Grou
Surround-view Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-insensitive Multi-task Framework
Surround-view fisheye perception under valet parking scenes is fundamental
and crucial in autonomous driving. Environmental conditions in parking lots
perform differently from the common public datasets, such as imperfect light
and opacity, which substantially impacts on perception performance. Most
existing networks based on public datasets may generalize suboptimal results on
these valet parking scenes, also affected by the fisheye distortion. In this
article, we introduce a new large-scale fisheye dataset called Fisheye Parking
Dataset(FPD) to promote the research in dealing with diverse real-world
surround-view parking cases. Notably, our compiled FPD exhibits excellent
characteristics for different surround-view perception tasks. In addition, we
also propose our real-time distortion-insensitive multi-task framework Fisheye
Perception Network (FPNet), which improves the surround-view fisheye BEV
perception by enhancing the fisheye distortion operation and multi-task
lightweight designs. Extensive experiments validate the effectiveness of our
approach and the dataset's exceptional generalizability.Comment: 12 pages, 11 figure
- …