1,655 research outputs found

    From light rays to 3D models

    Get PDF

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation

    Get PDF
    The goal of this research is to develop algorithms using multiple views to automatically recover complete 3D models of articulated objects in unstructured environments and thereby enable a robotic system to facilitate further manipulation of those objects. First, an algorithm called Procrustes-Lo-RANSAC (PLR) is presented. Structure-from-motion techniques are used to capture 3D point cloud models of an articulated object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. The algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Second, with such a resulting articulated model, a robotic system is then able to manipulate the object either along its joint axes at a specified grasp point in order to exercise its degrees of freedom or move its end effector to a particular position even if the point is not visible in the current view. This is one of the main advantages of the occlusion-aware approach, because the models capture all sides of the object meaning that the robot has knowledge of parts of the object that are not visible in the current view. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints. Third, we improve the proposed approach by using a RGBD sensor (Microsoft Kinect) that yield a depth value for each pixel immediately by the sensor itself rather than requiring correspondence to establish depth. KinectFusion algorithm is applied to produce a single high-quality, geometrically accurate 3D model from which rigid links of the object are segmented and aligned, allowing the joint axes to be estimated using the geometric approach. The improved algorithm does not require artificial markers attached to objects, yields much denser 3D models and reduces the computation time

    RGB-D And Thermal Sensor Fusion: A Systematic Literature Review

    Full text link
    In the last decade, the computer vision field has seen significant progress in multimodal data fusion and learning, where multiple sensors, including depth, infrared, and visual, are used to capture the environment across diverse spectral ranges. Despite these advancements, there has been no systematic and comprehensive evaluation of fusing RGB-D and thermal modalities to date. While autonomous driving using LiDAR, radar, RGB, and other sensors has garnered substantial research interest, along with the fusion of RGB and depth modalities, the integration of thermal cameras and, specifically, the fusion of RGB-D and thermal data, has received comparatively less attention. This might be partly due to the limited number of publicly available datasets for such applications. This paper provides a comprehensive review of both, state-of-the-art and traditional methods used in fusing RGB-D and thermal camera data for various applications, such as site inspection, human tracking, fault detection, and others. The reviewed literature has been categorised into technical areas, such as 3D reconstruction, segmentation, object detection, available datasets, and other related topics. Following a brief introduction and an overview of the methodology, the study delves into calibration and registration techniques, then examines thermal visualisation and 3D reconstruction, before discussing the application of classic feature-based techniques as well as modern deep learning approaches. The paper concludes with a discourse on current limitations and potential future research directions. It is hoped that this survey will serve as a valuable reference for researchers looking to familiarise themselves with the latest advancements and contribute to the RGB-DT research field.Comment: 33 pages, 20 figure

    Image-based 3-D reconstruction of constrained environments

    Get PDF
    Nuclear power plays a important role to the United Kingdom electricity generation infrastructure, providing a reliable baseload of low carbon electricity. The Advanced Gas-cooled Reactor (AGR) design makes up approximately 50% of the existing fleet, however, many of the operating reactors have exceeding their original design lifetimes.To ensure safe reactor operation, engineers perform periodic in-core visual inspections of reactor components to monitor the structural health of the core as it ages. However, current inspection mechanisms deployed provide limited structural information about the fuel channel or defects.;This thesis investigates the suitability of image-based 3-D reconstruction techniques to acquire 3-D structural geometry to enable improved diagnostic and prognostic abilities for inspection engineers. The application of image-based 3-D reconstruction to in-core inspection footage highlights significant challenges, most predominantly that the image saliency proves insuffcient for general reconstruction frameworks. The contribution of the thesis is threefold. Firstly, a novel semi-dense matching scheme which exploits sparse and dense image correspondence in combination with a novel intra-image region strength approach to improve the stability of the correspondence between images.;This results in a percentage increase of 138.53% of correct feature matches over similar state-of-the-art image matching paradigms. Secondly, a bespoke incremental Structure-from-Motion (SfM) framework called the Constrained Homogeneous SfM (CH-SfM) which is able to derive structure from deficient feature spaces and constrained environments. Thirdly, the application of the CH-SfM framework to remote visual inspection footage gathered within AGR fuel channels, outperforming other state-of-the-art reconstruction approaches and extracting representative 3-D structural geometry of orientational scans and fully circumferential reconstructions.;This is demonstrated on in-core and laboratory footage, achieving an approximate 3-D point density of 2.785 - 23.8025NX/cm² for real in-core inspection footage and high quality laboratory footage respectively. The demonstrated novelties have applicability to other constrained or feature-poor environments, with future work looking to producing fully dense, photo-realistic 3-D reconstructions.Nuclear power plays a important role to the United Kingdom electricity generation infrastructure, providing a reliable baseload of low carbon electricity. The Advanced Gas-cooled Reactor (AGR) design makes up approximately 50% of the existing fleet, however, many of the operating reactors have exceeding their original design lifetimes.To ensure safe reactor operation, engineers perform periodic in-core visual inspections of reactor components to monitor the structural health of the core as it ages. However, current inspection mechanisms deployed provide limited structural information about the fuel channel or defects.;This thesis investigates the suitability of image-based 3-D reconstruction techniques to acquire 3-D structural geometry to enable improved diagnostic and prognostic abilities for inspection engineers. The application of image-based 3-D reconstruction to in-core inspection footage highlights significant challenges, most predominantly that the image saliency proves insuffcient for general reconstruction frameworks. The contribution of the thesis is threefold. Firstly, a novel semi-dense matching scheme which exploits sparse and dense image correspondence in combination with a novel intra-image region strength approach to improve the stability of the correspondence between images.;This results in a percentage increase of 138.53% of correct feature matches over similar state-of-the-art image matching paradigms. Secondly, a bespoke incremental Structure-from-Motion (SfM) framework called the Constrained Homogeneous SfM (CH-SfM) which is able to derive structure from deficient feature spaces and constrained environments. Thirdly, the application of the CH-SfM framework to remote visual inspection footage gathered within AGR fuel channels, outperforming other state-of-the-art reconstruction approaches and extracting representative 3-D structural geometry of orientational scans and fully circumferential reconstructions.;This is demonstrated on in-core and laboratory footage, achieving an approximate 3-D point density of 2.785 - 23.8025NX/cm² for real in-core inspection footage and high quality laboratory footage respectively. The demonstrated novelties have applicability to other constrained or feature-poor environments, with future work looking to producing fully dense, photo-realistic 3-D reconstructions

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Self consistent bathymetric mapping from robotic vehicles in the deep ocean

    Get PDF
    Submitted In partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and Woods Hole Oceanographic Institution June 2005Obtaining accurate and repeatable navigation for robotic vehicles in the deep ocean is difficult and consequently a limiting factor when constructing vehicle-based bathymetric maps. This thesis presents a methodology to produce self-consistent maps and simultaneously improve vehicle position estimation by exploiting accurate local navigation and utilizing terrain relative measurements. It is common for errors in the vehicle position estimate to far exceed the errors associated with the acoustic range sensor. This disparity creates inconsistency when an area is imaged multiple times and causes artifacts that distort map integrity. Our technique utilizes small terrain "submaps" that can be pairwise registered and used to additionally constrain the vehicle position estimates in accordance with actual bottom topography. A delayed state Kalman filter is used to incorporate these sub-map registrations as relative position measurements between previously visited vehicle locations. The archiving of previous positions in a filter state vector allows for continual adjustment of the sub-map locations. The terrain registration is accomplished using a two dimensional correlation and a six degree of freedom point cloud alignment method tailored for bathymetric data. The complete bathymetric map is then created from the union of all sub-maps that have been aligned in a consistent manner. Experimental results from the fully automated processing of a multibeam survey over the TAG hydrothermal structure at the Mid-Atlantic ridge are presented to validate the proposed method.This work was funded by the CenSSIS ERC of the Nation Science Foundation under grant EEC-9986821 and in part by the Woods Hole Oceanographic Institution through a grant from the Penzance Foundation

    Towards markerless orthopaedic navigation with intuitive Optical See-through Head-mounted displays

    Get PDF
    The potential of image-guided orthopaedic navigation to improve surgical outcomes has been well-recognised during the last two decades. According to the tracked pose of target bone, the anatomical information and preoperative plans are updated and displayed to surgeons, so that they can follow the guidance to reach the goal with higher accuracy, efficiency and reproducibility. Despite their success, current orthopaedic navigation systems have two main limitations: for target tracking, artificial markers have to be drilled into the bone and calibrated manually to the bone, which introduces the risk of additional harm to patients and increases operating complexity; for guidance visualisation, surgeons have to shift their attention from the patient to an external 2D monitor, which is disruptive and can be mentally stressful. Motivated by these limitations, this thesis explores the development of an intuitive, compact and reliable navigation system for orthopaedic surgery. To this end, conventional marker-based tracking is replaced by a novel markerless tracking algorithm, and the 2D display is replaced by a 3D holographic Optical see-through (OST) Head-mounted display (HMD) precisely calibrated to a user's perspective. Our markerless tracking, facilitated by a commercial RGBD camera, is achieved through deep learning-based bone segmentation followed by real-time pose registration. For robust segmentation, a new network is designed and efficiently augmented by a synthetic dataset. Our segmentation network outperforms the state-of-the-art regarding occlusion-robustness, device-agnostic behaviour, and target generalisability. For reliable pose registration, a novel Bounded Iterative Closest Point (BICP) workflow is proposed. The improved markerless tracking can achieve a clinically acceptable error of 0.95 deg and 2.17 mm according to a phantom test. OST displays allow ubiquitous enrichment of perceived real world with contextually blended virtual aids through semi-transparent glasses. They have been recognised as a suitable visual tool for surgical assistance, since they do not hinder the surgeon's natural eyesight and require no attention shift or perspective conversion. The OST calibration is crucial to ensure locational-coherent surgical guidance. Current calibration methods are either human error-prone or hardly applicable to commercial devices. To this end, we propose an offline camera-based calibration method that is highly accurate yet easy to implement in commercial products, and an online alignment-based refinement that is user-centric and robust against user error. The proposed methods are proven to be superior to other similar State-of- the-art (SOTA)s regarding calibration convenience and display accuracy. Motivated by the ambition to develop the world's first markerless OST navigation system, we integrated the developed markerless tracking and calibration scheme into a complete navigation workflow designed for femur drilling tasks during knee replacement surgery. We verify the usability of our designed OST system with an experienced orthopaedic surgeon by a cadaver study. Our test validates the potential of the proposed markerless navigation system for surgical assistance, although further improvement is required for clinical acceptance.Open Acces
    corecore