424 research outputs found
Unsupervised Monocular Depth Estimation with Left-Right Consistency
Learning based methods have shown very promising results for the task of
depth estimation in single images. However, most existing approaches treat
depth prediction as a supervised regression problem and as a result, require
vast quantities of corresponding ground truth depth data for training. Just
recording quality depth data in a range of environments is a challenging
problem. In this paper, we innovate beyond existing approaches, replacing the
use of explicit depth data during training with easier-to-obtain binocular
stereo footage.
We propose a novel training objective that enables our convolutional neural
network to learn to perform single image depth estimation, despite the absence
of ground truth depth data. Exploiting epipolar geometry constraints, we
generate disparity images by training our network with an image reconstruction
loss. We show that solving for image reconstruction alone results in poor
quality depth images. To overcome this problem, we propose a novel training
loss that enforces consistency between the disparities produced relative to
both the left and right images, leading to improved performance and robustness
compared to existing approaches. Our method produces state of the art results
for monocular depth estimation on the KITTI driving dataset, even outperforming
supervised methods that have been trained with ground truth depth.Comment: CVPR 2017 ora
Lightweight Monocular Depth Estimation Model by Joint End-to-End Filter pruning
Convolutional neural networks (CNNs) have emerged as the state-of-the-art in
multiple vision tasks including depth estimation. However, memory and computing
power requirements remain as challenges to be tackled in these models.
Monocular depth estimation has significant use in robotics and virtual reality
that requires deployment on low-end devices. Training a small model from
scratch results in a significant drop in accuracy and it does not benefit from
pre-trained large models. Motivated by the literature of model pruning, we
propose a lightweight monocular depth model obtained from a large trained
model. This is achieved by removing the least important features with a novel
joint end-to-end filter pruning. We propose to learn a binary mask for each
filter to decide whether to drop the filter or not. These masks are trained
jointly to exploit relations between filters at different layers as well as
redundancy within the same layer. We show that we can achieve around 5x
compression rate with small drop in accuracy on the KITTI driving dataset. We
also show that masking can improve accuracy over the baseline with fewer
parameters, even without enforcing compression loss
Ego-motion and Surrounding Vehicle State Estimation Using a Monocular Camera
Understanding ego-motion and surrounding vehicle state is essential to enable
automated driving and advanced driving assistance technologies. Typical
approaches to solve this problem use fusion of multiple sensors such as LiDAR,
camera, and radar to recognize surrounding vehicle state, including position,
velocity, and orientation. Such sensing modalities are overly complex and
costly for production of personal use vehicles. In this paper, we propose a
novel machine learning method to estimate ego-motion and surrounding vehicle
state using a single monocular camera. Our approach is based on a combination
of three deep neural networks to estimate the 3D vehicle bounding box, depth,
and optical flow from a sequence of images. The main contribution of this paper
is a new framework and algorithm that integrates these three networks in order
to estimate the ego-motion and surrounding vehicle state. To realize more
accurate 3D position estimation, we address ground plane correction in
real-time. The efficacy of the proposed method is demonstrated through
experimental evaluations that compare our results to ground truth data
available from other sensors including Can-Bus and LiDAR
- …