203 research outputs found

    Depth Estimation Using 2D RGB Images

    Get PDF
    Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues

    ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation

    Get PDF
    We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects

    ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation

    Get PDF
    We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, the dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions among the landmarks extracted from the same rigid body for clustering and estimating static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix upon landmarks, and uses agglomerative clustering for distinguishing those rigid bodies. Accompanied by a decoupled factor graph optimization for revising their shape and trajectory, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneous tracking of ego-motion and multiple objects

    Robust Visual Odometry and Dynamic Scene Modelling

    Get PDF
    Image-based estimation of camera trajectory, known as visual odometry (VO), has been a popular solution for robot navigation in the past decade due to its low-cost and widely applicable properties. The problem of tracking self-motion as well as motion of objects in the scene using information from a camera is known as multi-body visual odometry and is a challenging task. The performance of VO is heavily sensitive to poor imaging conditions (i.e., direct sunlight, shadow and image blur), which limits its feasibility in many challenging scenarios. Current VO solutions can provide accurate camera motion estimation in largely static scene. However, the deployment of robotic systems in our daily lives requires systems to work in significantly more complex, dynamic environment. This thesis aims to develop robust VO solutions against two challenging cases, underwater and highly dynamic environments, by extensively analyzing and overcoming the difficulties in both cases to achieve accurate ego-motion estimation. Furthermore, to better understand and exploit dynamic scene information, this thesis also investigates the motion of moving objects in dynamic scene, and presents a novel way to integrate ego and object motion estimation into a single framework. In particular, the problem of VO in underwater is challenging due to poor imaging condition and inconsistent motion caused by water flow. This thesis intensively tests and evaluates possible solutions to the mentioned issues, and proposes a stereo underwater VO system that is able to robustly and accurately localize the autonomous underwater vehicle (AUV). Visual odometry in dynamic environment is challenging because dynamic parts of the scene violate the static world assumption fundamental in most classical visual odometry algorithms. If moving parts of a scene dominate the static scene, off-the-shelf VO systems either fail completely or return poor quality trajectory estimation. Most existing techniques try to simplify the problem by removing dynamic information. Arguably, in most scenarios, the dynamics corresponds to a finite number of individual objects that are rigid or piecewise rigid, and their motions can be tracked and estimated in the same way as the ego-motion. With this consideration, the thesis proposes a brand new way to model and estimate object motion, and introduces a novel multi-body VO system that addresses the problem of tracking of both ego and object motion in dynamic outdoor scenes. Based on the proposed multi-body VO framework, this thesis also exploits the spatial and temporal relationships between the camera and object motions, as well as static and dynamic structures, to obtain more consistent and accurate estimations. To this end, the thesis introduces a novel visual dynamic object-aware SLAM system, that is able to achieve robust multiple moving objects tracking, accurate estimation of full SE(3) object motions, and extract inherent linear velocity information of moving objects, along with an accurate robot localisation and mapping of environment structure. The performance of the proposed system is demonstrated on real datasets, showing its capability to resolve rigid object motion estimation and yielding results that outperform state-of-the-art algorithms by an order of magnitude in urban driving scenarios

    Visual slam in dynamic environments

    Get PDF
    El problema de localización y construcción visual simultánea de mapas (visual SLAM por sus siglas en inglés Simultaneous Localization and Mapping) consiste en localizar una cámara en un mapa que se construye de manera online. Esta tecnología permite la localización de robots en entornos desconocidos y la creación de un mapa de la zona con los sensores que lleva incorporados, es decir, sin contar con ninguna infraestructura externa. A diferencia de los enfoques de odometría en los cuales el movimiento incremental es integrado en el tiempo, un mapa permite que el sensor se localice continuamente en el mismo entorno sin acumular deriva.Asumir que la escena observada es estática es común en los algoritmos de SLAM visual. Aunque la suposición estática es válida para algunas aplicaciones, limita su utilidad en escenas concurridas del mundo real para la conducción autónoma, los robots de servicio o realidad aumentada y virtual entre otros. La detección y el estudio de objetos dinámicos es un requisito para estimar con precisión la posición del sensor y construir mapas estables, útiles para aplicaciones robóticas que operan a largo plazo.Las contribuciones principales de esta tesis son tres: 1. Somos capaces de detectar objetos dinámicos con la ayuda del uso de la segmentación semántica proveniente del aprendizaje profundo y el uso de enfoques de geometría multivisión. Esto nos permite lograr una precisión en la estimación de la trayectoria de la cámara en escenas altamente dinámicas comparable a la que se logra en entornos estáticos, así como construir mapas en 3D que contienen sólo la estructura del entorno estático y estable. 2. Logramos alucinar con imágenes realistas la estructura estática de la escena detrás de los objetos dinámicos. Esto nos permite ofrecer mapas completos con una representación plausible de la escena sin discontinuidades o vacíos ocasionados por las oclusiones de los objetos dinámicos. El reconocimiento visual de lugares también se ve impulsado por estos avances en el procesamiento de imágenes. 3. Desarrollamos un marco conjunto tanto para resolver el problema de SLAM como el seguimiento de múltiples objetos con el fin de obtener un mapa espacio-temporal con información de la trayectoria del sensor y de los alrededores. La comprensión de los objetos dinámicos circundantes es de crucial importancia para los nuevos requisitos de las aplicaciones emergentes de realidad aumentada/virtual o de la navegación autónoma. Estas tres contribuciones hacen avanzar el estado del arte en SLAM visual. Como un producto secundario de nuestra investigación y para el beneficio de la comunidad científica, hemos liberado el código que implementa las soluciones propuestas.<br /

    Object-Aware Tracking and Mapping

    Get PDF
    Reasoning about geometric properties of digital cameras and optical physics enabled researchers to build methods that localise cameras in 3D space from a video stream, while – often simultaneously – constructing a model of the environment. Related techniques have evolved substantially since the 1980s, leading to increasingly accurate estimations. Traditionally, however, the quality of results is strongly affected by the presence of moving objects, incomplete data, or difficult surfaces – i.e. surfaces that are not Lambertian or lack texture. One insight of this work is that these problems can be addressed by going beyond geometrical and optical constraints, in favour of object level and semantic constraints. Incorporating specific types of prior knowledge in the inference process, such as motion or shape priors, leads to approaches with distinct advantages and disadvantages. After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building object-centric maps in dynamic environments using motion priors are investigated in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents an approach which relies on semantic priors rather than motion cues. To fully exploit semantic information, Chapter 7 discusses the conditioning of shape representations on prior knowledge and the practical application to monocular, object-aware reconstruction systems

    Active and Physics-Based Human Pose Reconstruction

    Get PDF
    Perceiving humans is an important and complex problem within computervision. Its significance is derived from its numerous applications, suchas human-robot interaction, virtual reality, markerless motion capture,and human tracking for autonomous driving. The difficulty lies in thevariability in human appearance, physique, and plausible body poses. Inreal-world scenes, this is further exacerbated by difficult lightingconditions, partial occlusions, and the depth ambiguity stemming fromthe loss of information during the 3d to 2d projection. Despite thesechallenges, significant progress has been made in recent years,primarily due to the expressive power of deep neural networks trained onlarge datasets. However, creating large-scale datasets with 3dannotations is expensive, and capturing the vast diversity of the realworld is demanding. Traditionally, 3d ground truth is captured usingmotion capture laboratories that require large investments. Furthermore,many laboratories cannot easily accommodate athletic and dynamicmotions. This thesis studies three approaches to improving visualperception, with emphasis on human pose estimation, that can complementimprovements to the underlying predictor or training data.The first two papers present active human pose estimation, where areinforcement learning agent is tasked with selecting informativeviewpoints to reconstruct subjects efficiently. The papers discard thecommon assumption that the input is given and instead allow the agent tomove to observe subjects from desirable viewpoints, e.g., those whichavoid occlusions and for which the underlying pose estimator has a lowprediction error.The third paper introduces the task of embodied visual active learning,which goes further and assumes that the perceptual model is notpre-trained. Instead, the agent is tasked with exploring its environmentand requesting annotations to refine its visual model. Learning toexplore novel scenarios and efficiently request annotation for new datais a step towards life-long learning, where models can evolve beyondwhat they learned during the initial training phase. We study theproblem for segmentation, though the idea is applicable to otherperception tasks.Lastly, the final two papers propose improving human pose estimation byintegrating physical constraints. These regularize the reconstructedmotions to be physically plausible and serve as a complement to currentkinematic approaches. Whether a motion has been observed in the trainingdata or not, the predictions should obey the laws of physics. Throughintegration with a physical simulator, we demonstrate that we can reducereconstruction artifacts and enforce, e.g., contact constraints
    corecore