17 research outputs found

    NON-RIGID MULTI-BODY TRACKING IN RGBD STREAMS

    Get PDF
    To efficiently collect training data for an off-the-shelf object detector, we consider the problem of segmenting and tracking non-rigid objects from RGBD sequences by introducing the spatio-temporal matrix with very few assumptions – no prior object model and no stationary sensor. Spatial temporal matrix is able to encode not only spatial associations between multiple objects, but also component-level spatio temporal associations that allow the correction of falsely segmented objects in the presence of various types of interaction among multiple objects. Extensive experiments over complex human/animal body motions with occlusions and body part motions demonstrate that our approach substantially improves tracking robustness and segmentation accuracy

    On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

    No full text
    Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future

    First order augmentation to tensor voting for boundary inference and multiscale analysis in 3D

    No full text
    Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism

    Towards urban 3d reconstruction from video

    No full text
    The paper introduces a data collection system and a processing pipeline for automatic geo-registered 3D reconstruction of urban scenes from video. The system collects multiple video streams, as well as GPS and INS measurements in order to place the reconstructed models in georegistered coordinates. Besides high quality in terms of both geometry and appearance, we aim at real-time performance. Even though our processing pipeline is currently far from being real-time, we select techniques and we design processing modules that can achieve fast performance on multiple CPUs and GPUs aiming at real-time performance in the near future. We present the main considerations in designing the system and the steps of the processing pipeline. We show results on real video sequences captured by our system.

    Detailed real-time urban 3D reconstruction from video.

    Get PDF
    The paper presents a system for automatic, georegistered, real-time 3D reconstruction from video of urban scenes. The system collects video streams, as well as GPS and inertia measurements in order to place the reconstructed models in geo-registered coordinates. It is designed using current state of the art real-time modules for all processing steps. It employs commodity graphics hardware and standard CPU's to achieve real-time performance. We present the main considerations in designing the system and the steps of the processing pipeline. Our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab. To account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real-time performance. The required accuracy for many applications is achieved with a two-step stereo reconstruction process exploiting the redundancy across frames. We show results on real video sequences comprising hundreds of thousands of frames

    Real-Time Video-Based Reconstruction of Urban Environments

    No full text
    We present an approach for automatic 3D reconstruction of outdoor scenes using computer vision techniques. Our system collects video, GPS and INS data which are processed in real-time to produce geo-registered, detailed 3D models that represent the geometry and appearance of the world. These models are generated without manual measurements or markers in the scene and can be used for visualization from arbitrary viewpoints, documentation and archiving of large areas. Our system consists of a data acquisition system and a processing system that generated 3D models from the video sequences off-line but in real-time. The GPS/INS measurements allow us to geo-register the pose of the camera at the time each frame was captured. The following stages of the processing pipeline perform dense reconstruction and generate the 3D models, which are in the form of a triangular mesh and a set of images that provide texture. By leveraging the processing power of the GPU, we are able to achieve faster than real-time performance, while maintaining an accuracy of a few cm.
    corecore