1,382 research outputs found
LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery
An accurate depth map of the environment is critical to the safe operation of
autonomous robots and vehicles. Currently, either light detection and ranging
(LIDAR) or stereo matching algorithms are used to acquire such depth
information. However, a high-resolution LIDAR is expensive and produces sparse
depth map at large range; stereo matching algorithms are able to generate
denser depth maps but are typically less accurate than LIDAR at long range.
This paper combines these approaches together to generate high-quality dense
depth maps. Unlike previous approaches that are trained using ground-truth
labels, the proposed model adopts a self-supervised training process.
Experiments show that the proposed method is able to generate high-quality
dense depth maps and performs robustly even with low-resolution inputs. This
shows the potential to reduce the cost by using LIDARs with lower resolution in
concert with stereo systems while maintaining high resolution.Comment: 14 pages, 3 figures, 5 table
Dense Depth Estimation in Monocular Endoscopy with Self-supervised Learning Methods
We present a self-supervised approach to training convolutional neural
networks for dense depth estimation from monocular endoscopy data without a
priori modeling of anatomy or shading. Our method only requires monocular
endoscopic videos and a multi-view stereo method, e.g., structure from motion,
to supervise learning in a sparse manner. Consequently, our method requires
neither manual labeling nor patient computed tomography (CT) scan in the
training and application phases. In a cross-patient experiment using CT scans
as groundtruth, the proposed method achieved submillimeter mean residual error.
In a comparison study to recent self-supervised depth estimation methods
designed for natural video on in vivo sinus endoscopy data, we demonstrate that
the proposed approach outperforms the previous methods by a large margin. The
source code for this work is publicly available online at
https://github.com/lppllppl920/EndoscopyDepthEstimation-Pytorch.Comment: Accepted to IEEE Transactions on Medical Imagin
Autonomous Driving with Deep Learning: A Survey of State-of-Art Technologies
Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007,
autonomous driving has been the most active field of AI applications. Almost at
the same time, deep learning has made breakthrough by several pioneers, three
of them (also called fathers of deep learning), Hinton, Bengio and LeCun, won
ACM Turin Award in 2019. This is a survey of autonomous driving technologies
with deep learning methods. We investigate the major fields of self-driving
systems, such as perception, mapping and localization, prediction, planning and
control, simulation, V2X and safety etc. Due to the limited space, we focus the
analysis on several key areas, i.e. 2D and 3D object detection in perception,
depth estimation from cameras, multiple sensor fusion on the data, feature and
task level respectively, behavior modelling and prediction of vehicle driving
and pedestrian trajectories
Semantic-Guided Representation Enhancement for Self-supervised Monocular Trained Depth Estimation
Self-supervised depth estimation has shown its great effectiveness in
producing high quality depth maps given only image sequences as input. However,
its performance usually drops when estimating on border areas or objects with
thin structures due to the limited depth representation ability. In this paper,
we address this problem by proposing a semantic-guided depth representation
enhancement method, which promotes both local and global depth feature
representations by leveraging rich contextual information. In stead of a single
depth network as used in conventional paradigms, we propose an extra semantic
segmentation branch to offer extra contextual features for depth estimation.
Based on this framework, we enhance the local feature representation by
sampling and feeding the point-based features that locate on the semantic edges
to an individual Semantic-guided Edge Enhancement module (SEEM), which is
specifically designed for promoting depth estimation on the challenging
semantic borders. Then, we improve the global feature representation by
proposing a semantic-guided multi-level attention mechanism, which enhances the
semantic and depth features by exploring pixel-wise correlations in the
multi-level depth decoding scheme. Extensive experiments validate the distinct
superiority of our method in capturing highly accurate depth on the challenging
image areas such as semantic category borders and thin objects. Both
quantitative and qualitative experiments on KITTI show that our method
outperforms the state-of-the-art methods
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation
The field of self-supervised monocular depth estimation has seen huge
advancements in recent years. Most methods assume stereo data is available
during training but usually under-utilize it and only treat it as a reference
signal. We propose a novel self-supervised approach which uses both left and
right images equally during training, but can still be used with a single input
image at test time, for monocular depth estimation. Our Siamese network
architecture consists of two, twin networks, each learns to predict a disparity
map from a single image. At test time, however, only one of these networks is
used in order to infer depth. We show state-of-the-art results on the standard
KITTI Eigen split benchmark as well as being the highest scoring
self-supervised method on the new KITTI single view benchmark. To demonstrate
the ability of our method to generalize to new data sets, we further provide
results on the Make3D benchmark, which was not used during training
Cascade Network for Self-Supervised Monocular Depth Estimation
It is a classical compute vision problem to obtain real scene depth maps by
using a monocular camera, which has been widely concerned in recent years.
However, training this model usually requires a large number of artificially
labeled samples. To solve this problem, some researchers use a self-supervised
learning model to overcome this problem and reduce the dependence on manually
labeled data. Nevertheless, the accuracy and reliability of these methods have
not reached the expected standard. In this paper, we propose a new
self-supervised learning method based on cascade networks. Compared with the
previous self-supervised methods, our method has improved accuracy and
reliability, and we have proved this by experiments. We show a cascaded neural
network that divides the target scene into parts of different sight distances
and trains them separately to generate a better depth map. Our approach is
divided into the following four steps. In the first step, we use the
self-supervised model to estimate the depth of the scene roughly. In the second
step, the depth of the scene generated in the first step is used as a label to
divide the scene into different depth parts. The third step is to use models
with different parameters to generate depth maps of different depth parts in
the target scene, and the fourth step is to fuse the depth map. Through the
ablation study, we demonstrated the effectiveness of each component
individually and showed high-quality, state-of-the-art results in the KITTI
benchmark.Comment: 22 pages, 6 figure
Deep Learning based Monocular Depth Prediction: Datasets, Methods and Applications
Estimating depth from RGB images can facilitate many computer vision tasks,
such as indoor localization, height estimation, and simultaneous localization
and mapping (SLAM). Recently, monocular depth estimation has obtained great
progress owing to the rapid development of deep learning techniques. They
surpass traditional machine learning-based methods by a large margin in terms
of accuracy and speed. Despite the rapid progress in this topic, there are
lacking of a comprehensive review, which is needed to summarize the current
progress and provide the future directions. In this survey, we first introduce
the datasets for depth estimation, and then give a comprehensive introduction
of the methods from three perspectives: supervised learning-based methods,
unsupervised learning-based methods, and sparse samples guidance-based methods.
In addition, downstream applications that benefit from the progress have also
been illustrated. Finally, we point out the future directions and conclude the
paper
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
A 360{\deg} perception of scene geometry is essential for automated driving,
notably for parking and urban driving scenarios. Typically, it is achieved
using surround-view fisheye cameras, focusing on the near-field area around the
vehicle. The majority of current depth estimation approaches focus on employing
just a single camera, which cannot be straightforwardly generalized to multiple
cameras. The depth estimation model must be tested on a variety of cameras
equipped to millions of cars with varying camera geometries. Even within a
single car, intrinsics vary due to manufacturing tolerances. Deep learning
models are sensitive to these changes, and it is practically infeasible to
train and test on each camera variant. As a result, we present novel
camera-geometry adaptive multi-scale convolutions which utilize the camera
parameters as a conditional input, enabling the model to generalize to
previously unseen fisheye cameras. Additionally, we improve the distance
estimation by pairwise and patchwise vector-based self-attention encoder
networks. We evaluate our approach on the Fisheye WoodScape surround-view
dataset, significantly improving over previous approaches. We also show a
generalization of our approach across different camera viewing angles and
perform extensive experiments to support our contributions. To enable
comparison with other approaches, we evaluate the front camera data on the
KITTI dataset (pinhole camera images) and achieve state-of-the-art performance
among self-supervised monocular methods. An overview video with qualitative
results is provided at https://youtu.be/bmX0UcU9wtA. Baseline code and dataset
will be made public.Comment: To be published at IEEE Transactions on Intelligent Transportation
System
Neural Rendering and Reenactment of Human Actor Videos
We propose a method for generating video-realistic animations of real humans
under user control. In contrast to conventional human character rendering, we
do not require the availability of a production-quality photo-realistic 3D
model of the human, but instead rely on a video sequence in conjunction with a
(medium-quality) controllable 3D template model of the person. With that, our
approach significantly reduces production cost compared to conventional
rendering approaches based on production-quality 3D models, and can also be
used to realistically edit existing videos. Technically, this is achieved by
training a neural network that translates simple synthetic images of a human
character into realistic imagery. For training our networks, we first track the
3D motion of the person in the video using the template model, and subsequently
generate a synthetically rendered version of the video. These images are then
used to train a conditional generative adversarial network that translates
synthetic images of the 3D model into realistic imagery of the human. We
evaluate our method for the reenactment of another person that is tracked in
order to obtain the motion data, and show video results generated from
artist-designed skeleton motion. Our results outperform the state-of-the-art in
learning-based human image synthesis. Project page:
http://gvv.mpi-inf.mpg.de/projects/wxu/HumanReenactment/Comment: ACM ToG paper. Project page:
http://gvv.mpi-inf.mpg.de/projects/wxu/HumanReenactment
- …