584 research outputs found

    Depth Enhancement and Surface Reconstruction with RGB/D Sequence

    Get PDF
    Surface reconstruction and 3D modeling is a challenging task, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. It is fundamental to many applications such as robot navigation, animation and scene understanding, industrial control and medical diagnosis. In this dissertation, I take advantage of the consumer depth sensors for surface reconstruction. Considering its limited performance on capturing detailed surface geometry, a depth enhancement approach is proposed in the first place to recovery small and rich geometric details with captured depth and color sequence. In addition to enhancing its spatial resolution, I present a hybrid camera to improve the temporal resolution of consumer depth sensor and propose an optimization framework to capture high speed motion and generate high speed depth streams. Given the partial scans from the depth sensor, we also develop a novel fusion approach to build up complete and watertight human models with a template guided registration method. Finally, the problem of surface reconstruction for non-Lambertian objects, on which the current depth sensor fails, is addressed by exploiting multi-view images captured with a hand-held color camera and we propose a visual hull based approach to recovery the 3D model

    3D scanning of cultural heritage with consumer depth cameras

    Get PDF
    Three dimensional reconstruction of cultural heritage objects is an expensive and time-consuming process. Recent consumer real-time depth acquisition devices, like Microsoft Kinect, allow very fast and simple acquisition of 3D views. However 3D scanning with such devices is a challenging task due to the limited accuracy and reliability of the acquired data. This paper introduces a 3D reconstruction pipeline suited to use consumer depth cameras as hand-held scanners for cultural heritage objects. Several new contributions have been made to achieve this result. They include an ad-hoc filtering scheme that exploits the model of the error on the acquired data and a novel algorithm for the extraction of salient points exploiting both depth and color data. Then the salient points are used within a modified version of the ICP algorithm that exploits both geometry and color distances to precisely align the views even when geometry information is not sufficient to constrain the registration. The proposed method, although applicable to generic scenes, has been tuned to the acquisition of sculptures and in this connection its performance is rather interesting as the experimental results indicate

    PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring

    Full text link
    Liquid perception is critical for robotic pouring tasks. It usually requires the robust visual detection of flowing liquid. However, while recent works have shown promising results in liquid perception, they typically require labeled data for model training, a process that is both time-consuming and reliant on human labor. To this end, this paper proposes a simple yet effective framework PourIt!, to serve as a tool for robotic pouring tasks. We design a simple data collection pipeline that only needs image-level labels to reduce the reliance on tedious pixel-wise annotations. Then, a binary classification model is trained to generate Class Activation Map (CAM) that focuses on the visual difference between these two kinds of collected data, i.e., the existence of liquid drop or not. We also devise a feature contrast strategy to improve the quality of the CAM, thus entirely and tightly covering the actual liquid regions. Then, the container pose is further utilized to facilitate the 3D point cloud recovery of the detected liquid region. Finally, the liquid-to-container distance is calculated for visual closed-loop control of the physical robot. To validate the effectiveness of our proposed method, we also contribute a novel dataset for our task and name it PourIt! dataset. Extensive results on this dataset and physical Franka robot have shown the utility and effectiveness of our method in the robotic pouring tasks. Our dataset, code and pre-trained models will be available on the project page.Comment: ICCV202

    Towards Intelligent Telerobotics: Visualization and Control of Remote Robot

    Get PDF
    Human-machine cooperative or co-robotics has been recognized as the next generation of robotics. In contrast to current systems that use limited-reasoning strategies or address problems in narrow contexts, new co-robot systems will be characterized by their flexibility, resourcefulness, varied modeling or reasoning approaches, and use of real-world data in real time, demonstrating a level of intelligence and adaptability seen in humans and animals. The research I focused is in the two sub-field of co-robotics: teleoperation and telepresence. We firstly explore the ways of teleoperation using mixed reality techniques. I proposed a new type of display: hybrid-reality display (HRD) system, which utilizes commodity projection device to project captured video frame onto 3D replica of the actual target surface. It provides a direct alignment between the frame of reference for the human subject and that of the displayed image. The advantage of this approach lies in the fact that no wearing device needed for the users, providing minimal intrusiveness and accommodating users eyes during focusing. The field-of-view is also significantly increased. From a user-centered design standpoint, the HRD is motivated by teleoperation accidents, incidents, and user research in military reconnaissance etc. Teleoperation in these environments is compromised by the Keyhole Effect, which results from the limited field of view of reference. The technique contribution of the proposed HRD system is the multi-system calibration which mainly involves motion sensor, projector, cameras and robotic arm. Due to the purpose of the system, the accuracy of calibration should also be restricted within millimeter level. The followed up research of HRD is focused on high accuracy 3D reconstruction of the replica via commodity devices for better alignment of video frame. Conventional 3D scanner lacks either depth resolution or be very expensive. We proposed a structured light scanning based 3D sensing system with accuracy within 1 millimeter while robust to global illumination and surface reflection. Extensive user study prove the performance of our proposed algorithm. In order to compensate the unsynchronization between the local station and remote station due to latency introduced during data sensing and communication, 1-step-ahead predictive control algorithm is presented. The latency between human control and robot movement can be formulated as a linear equation group with a smooth coefficient ranging from 0 to 1. This predictive control algorithm can be further formulated by optimizing a cost function. We then explore the aspect of telepresence. Many hardware designs have been developed to allow a camera to be placed optically directly behind the screen. The purpose of such setups is to enable two-way video teleconferencing that maintains eye-contact. However, the image from the see-through camera usually exhibits a number of imaging artifacts such as low signal to noise ratio, incorrect color balance, and lost of details. Thus we develop a novel image enhancement framework that utilizes an auxiliary color+depth camera that is mounted on the side of the screen. By fusing the information from both cameras, we are able to significantly improve the quality of the see-through image. Experimental results have demonstrated that our fusion method compares favorably against traditional image enhancement/warping methods that uses only a single image

    The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

    Full text link
    Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\% on the KITTI optical flow benchmark, about 25\% better than the best published method. For an overview see https://diffusion-vision.github.io

    Polarimetric Pose Prediction

    Full text link
    Light has many properties that vision sensors can passively measure. Colour-band separated wavelength and intensity are arguably the most commonly used for monocular 6D object pose estimation. This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. A hybrid model that leverages physical priors jointly with a data-driven learning strategy is designed and carefully tested on objects with different levels of photometric complexity. Our design significantly improves the pose accuracy compared to state-of-the-art photometric approaches and enables object pose estimation for highly reflective and transparent objects. A new multi-modal instance-level 6D object pose dataset with highly accurate pose annotations for multiple objects with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7 Appendix

    Sensing of complex buildings and reconstruction into photo-realistic 3D models

    Get PDF
    The 3D reconstruction of indoor and outdoor environments has received an interest only recently, as companies began to recognize that using reconstructed models is a way to generate revenue through location-based services and advertisements. A great amount of research has been done in the field of 3D reconstruction, and one of the latest and most promising applications is Kinect Fusion, which was developed by Microsoft Research. Its strong points are the real-time intuitive 3D reconstruction, interactive frame rate, the level of detail in the models, and the availability of the hardware and software for researchers and enthusiasts. A representative effort towards 3D reconstruction is the Point Cloud Library (PCL). PCL is a large scale, open project for 2D/3D image and point cloud processing. On December 2011, PCL made available an implementation of Kinect Fusion, namely KinFu. KinFu emulates the functionality provided in Kinect Fusion. However, both implementations have two major limitations: 1. The real-time reconstruction takes place only within a cube with a size of 3 meters per axis. The cube's position is fixed at the start of execution, and any object outside of this cube is not integrated into the reconstructed model. Therefore the volume that can be scanned is always limited by the size of the cube. It is possible to manually align many small-size cubes into a single large model, however this is a time-consuming and difficult task, especially when the meshes have complex topologies and high polygon count, as is the case with the meshes obtained from KinFu. 2. The output mesh does not have any color textures. There are some at-tempts to add color in the output point cloud; however, the resulting effect is not photo-realistic. Applying photo-realistic textures to a model can enhance the user experience, even when the model has a simple topology. The main goal of this project is to design and implement a system that captures large indoor environments and generates 3D photo-realistic large indoor models in real time. This report describes an extended version of the KinFu system. The extensions overcome the scalability and texture reconstruction limitations using commodity hardware and open-source software. The complete hardware setup used in this project is worth €2,000, which is comparable to the cost of a single professional laser scanner. The software is released under BSD license, which makes it completely free to use and commercialize. The system has been integrated into the open-source PCL project. The immediate benefits are three-fold: the system becomes a potential industry standard, it is maintained and extended by many developers around the world with no addition-al cost to the VCA group, and it can reduce the application development time by reusing numerous state-of-the-art algorithms
    • …
    corecore