13 research outputs found
Don't Forget The Past: Recurrent Depth Estimation from Monocular Video
Autonomous cars need continuously updated depth information. Thus far, depth
is mostly estimated independently for a single frame at a time, even if the
method starts from video input. Our method produces a time series of depth
maps, which makes it an ideal candidate for online learning approaches. In
particular, we put three different types of depth estimation (supervised depth
prediction, self-supervised depth prediction, and self-supervised depth
completion) into a common framework. We integrate the corresponding networks
with a ConvLSTM such that the spatiotemporal structures of depth across frames
can be exploited to yield a more accurate depth estimation. Our method is
flexible. It can be applied to monocular videos only or be combined with
different types of sparse depth patterns. We carefully study the architecture
of the recurrent network and its training strategy. We are first to
successfully exploit recurrent networks for real-time self-supervised monocular
depth estimation and completion. Extensive experiments show that our recurrent
method outperforms its image-based counterpart consistently and significantly
in both self-supervised scenarios. It also outperforms previous depth
estimation methods of the three popular groups. Please refer to
https://www.trace.ethz.ch/publications/2020/rec_depth_estimation/ for details.Comment: Please refer to our webpage for details
https://www.trace.ethz.ch/publications/2020/rec_depth_estimation
Depth Estimation Using 2D RGB Images
Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues