56,357 research outputs found

    Depth Estimation Using 2D RGB Images

    Get PDF
    Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues

    Real-time Spatial Detection and Tracking of Resources in a Construction Environment

    Get PDF
    Construction accidents with heavy equipment and bad decision making can be based on poor knowledge of the site environment and in both cases may lead to work interruptions and costly delays. Supporting the construction environment with real-time generated three-dimensional (3D) models can help preventing accidents as well as support management by modeling infrastructure assets in 3D. Such models can be integrated in the path planning of construction equipment operations for obstacle avoidance or in a 4D model that simulates construction processes. Detecting and guiding resources, such as personnel, machines and materials in and to the right place on time requires methods and technologies supplying information in real-time. This paper presents research in real-time 3D laser scanning and modeling using high range frame update rate scanning technology. Existing and emerging sensors and techniques in three-dimensional modeling are explained. The presented research successfully developed computational models and algorithms for the real-time detection, tracking, and three-dimensional modeling of static and dynamic construction resources, such as workforce, machines, equipment, and materials based on a 3D video range camera. In particular, the proposed algorithm for rapidly modeling three-dimensional scenes is explained. Laboratory and outdoor field experiments that were conducted to validate the algorithm’s performance and results are discussed

    A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

    Full text link
    In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.Comment: 17 page

    A spatially distributed model for foreground segmentation

    Get PDF
    Foreground segmentation is a fundamental first processing stage for vision systems which monitor real-world activity. In this paper we consider the problem of achieving robust segmentation in scenes where the appearance of the background varies unpredictably over time. Variations may be caused by processes such as moving water, or foliage moved by wind, and typically degrade the performance of standard per-pixel background models. Our proposed approach addresses this problem by modeling homogeneous regions of scene pixels as an adaptive mixture of Gaussians in color and space. Model components are used to represent both the scene background and moving foreground objects. Newly observed pixel values are probabilistically classified, such that the spatial variance of the model components supports correct classification even when the background appearance is significantly distorted. We evaluate our method over several challenging video sequences, and compare our results with both per-pixel and Markov Random Field based models. Our results show the effectiveness of our approach in reducing incorrect classifications
    corecore