40 research outputs found
Dynamic Body VSLAM with Semantic Constraints
Image based reconstruction of urban environments is a challenging problem
that deals with optimization of large number of variables, and has several
sources of errors like the presence of dynamic objects. Since most large scale
approaches make the assumption of observing static scenes, dynamic objects are
relegated to the noise modeling section of such systems. This is an approach of
convenience since the RANSAC based framework used to compute most multiview
geometric quantities for static scenes naturally confine dynamic objects to the
class of outlier measurements. However, reconstructing dynamic objects along
with the static environment helps us get a complete picture of an urban
environment. Such understanding can then be used for important robotic tasks
like path planning for autonomous navigation, obstacle tracking and avoidance,
and other areas. In this paper, we propose a system for robust SLAM that works
in both static and dynamic environments. To overcome the challenge of dynamic
objects in the scene, we propose a new model to incorporate semantic
constraints into the reconstruction algorithm. While some of these constraints
are based on multi-layered dense CRFs trained over appearance as well as motion
cues, other proposed constraints can be expressed as additional terms in the
bundle adjustment optimization process that does iterative refinement of 3D
structure and camera / object motion trajectories. We show results on the
challenging KITTI urban dataset for accuracy of motion segmentation and
reconstruction of the trajectory and shape of moving objects relative to ground
truth. We are able to show average relative error reduction by a significant
amount for moving object trajectory reconstruction relative to state-of-the-art
methods like VISO 2, as well as standard bundle adjustment algorithms
Multimotion Visual Odometry (MVO): Simultaneous Estimation of Camera and Third-Party Motions
Estimating motion from images is a well-studied problem in computer vision
and robotics. Previous work has developed techniques to estimate the motion of
a moving camera in a largely static environment (e.g., visual odometry) and to
segment or track motions in a dynamic scene using known camera motions (e.g.,
multiple object tracking).
It is more challenging to estimate the unknown motion of the camera and the
dynamic scene simultaneously. Most previous work requires a priori object
models (e.g., tracking-by-detection), motion constraints (e.g., planar motion),
or fails to estimate the full SE(3) motions of the scene (e.g., scene flow).
While these approaches work well in specific application domains, they are not
generalizable to unconstrained motions.
This paper extends the traditional visual odometry (VO) pipeline to estimate
the full SE(3) motion of both a stereo/RGB-D camera and the dynamic scene. This
multimotion visual odometry (MVO) pipeline requires no a priori knowledge of
the environment or the dynamic objects. Its performance is evaluated on a
real-world dynamic dataset with ground truth for all motions from a motion
capture system.Comment: This updated manuscript corrects the experimental results published
in the proceedings of the 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS).. 8 Pages. 7 Figures. Video available
at https://www.youtube.com/watch?v=84tXCJOlj0
Depth Estimation Using 2D RGB Images
Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues
Recommended from our members
End to End Learning in Autonomous Driving Systems
Convolutional neural networks have advanced visual perception significantly in recent years. Two major ingredients that enable such a success are the composition of simple modules into a complex network and the end to end optimization. However, such success has not yet revolutionized robotics as much as vision, even if robotics suffer from similar problems as traditional computer vision, i.e. imperfectness of the manual pipeline design of the system. This thesis investigates using end-to-end learning for the autonomous driving system, a concrete robotic application. End to end learning can produce reasonable driving behaviors, even in the complex urban driving scenarios. Representation learning in end-to-end driving models is crucial, and auxiliary vision tasks such as semantic segmentation can help to form a more informative driving representation especially when training data is limited. Naive convolutional neural networks are usually only capable of doing reactive control and can not involve complex reasoning in a particular scenario. This thesis also studies how to handle scene conditioned driving behavior, which goes beyond the capability of reactive control. Alongside the end-to-end structure, learning methods also play a critical role. Imitation learning methods will acquire meaningful behaviors but usually, the robot can not master the skill. Reinforcement learning, on the contrary, either barely learns anything if the environment is too complex, or it can master the skill otherwise. To get the best of both worlds, this thesis proposes an algorithmically unified method to learn from both demonstration data and the environment
Jacobian Computation for Cumulative B-Splines on SE(3) and Application to Continuous-Time Object Tracking
In this paper we propose a method that estimates the SE(3) continuous trajectories (orientation and translation) of the dynamic rigid objects present in a scene, from multiple RGB-D views. Specifically, we fit the object trajectories to cumulative B-Splines curves, which allow us to interpolate, at any intermediate time stamp, not only their poses but also their linear and angular velocities and accelerations. Additionally, we derive in this work the analytical SE(3) Jacobians needed by the optimization, being applicable to any other approach that uses this type of curves. To the best of our knowledge this is the first work that proposes 6-DoF continuous-time object tracking, which we endorse with significant computational cost reduction thanks to our analytical derivations. We evaluate our proposal in synthetic data and in a public benchmark, showing competitive results in localization and significant improvements in velocity estimation in comparison to discrete-time approaches. © 2016 IEEE
ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation
We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects