317 research outputs found

    Monocular 3D Reconstruction of Locally Textured Surfaces

    Get PDF
    Most recent approaches to monocular non-rigid 3D shape recovery rely on exploiting point correspondences and work best when the whole surface is well-textured. The alternative is to rely either on contours or shading information, which has only been demonstrated in very restrictive settings. Here, we propose a novel approach to monocular deformable shape recovery that can operate under complex lighting and handle partially textured surfaces. At the heart of our algorithm are a learned mapping from intensity patterns to the shape of local surface patches and a principled approach to piecing together the resulting local shape estimates. We validate our approach quantitatively and qualitatively using both synthetic and real data

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Resolving Ambiguities in Monocular 3D Reconstruction of Deformable Surfaces

    Get PDF
    In this thesis, we focus on the problem of recovering 3D shapes of deformable surfaces from a single camera. This problem is known to be ill-posed as for a given 2D input image there exist many 3D shapes that give visually identical projections. We present three methods which make headway towards resolving these ambiguities. We believe that our work represents a significant step towards making surface reconstruction methods of practical use. First, we propose a surface reconstruction method that overcomes the limitations of the state-of-the-art template-based and non-rigid structure from motion methods. We neither track points over many frames, nor require a sophisticated deformation model, or depend on a reference image. In our method, we establish correspondences between pairs of frames in which the shape is different and unknown. We then estimate homographies between corresponding local planar patches in both images. These yield approximate 3D reconstructions of points within each patch up to a scale factor. Since we consider overlapping patches, we can enforce them to be consistent over the whole surface. Finally, a local deformation model is used to fit a triangulated mesh to the 3D point cloud, which makes the reconstruction robust to both noise and outliers in the image data. Second, we propose a novel approach to recovering the 3D shape of a deformable surface from a monocular input by taking advantage of shading information in more generic contexts than conventional Shape-from-Shading (SfS) methods. This includes surfaces that may be fully or partially textured and lit by arbitrarily many light sources. To this end, given a lighting model, we learn the relationship between a shading pattern and the corresponding local surface shape. At run time, we first use this knowledge to recover the shape of surface patches and then enforce spatial consistency between the patches to produce a global 3D shape. Instead of treating texture as noise as in many SfS approaches, we exploit it as an additional source of information. We validate our approach quantitatively and qualitatively using both synthetic and real data. Third, we introduce a constrained latent variable model that inherently accounts for geometric constraints such as inextensibility defined on the mesh model. To this end, we learn a non-linear mapping from the latent space to the output space, which corresponds to vertex positions of a mesh model, such that the generated outputs comply with equality and inequality constraints expressed in terms of the problem variables. Since its output is encouraged to satisfy such constraints inherently, using our model removes the need for computationally expensive methods that enforce these constraints at run time. In addition, our approach is completely generic and could be used in many other different contexts as well, such as image classification to impose separation of the classes, and articulated tracking to constrain the space of possible poses

    Enhancing endoscopic navigation and polyp detection using artificial intelligence

    Get PDF
    Colorectal cancer (CRC) is one most common and deadly forms of cancer. It has a very high mortality rate if the disease advances to late stages however early diagnosis and treatment can be curative is hence essential to enhancing disease management. Colonoscopy is considered the gold standard for CRC screening and early therapeutic treatment. The effectiveness of colonoscopy is highly dependent on the operator’s skill, as a high level of hand-eye coordination is required to control the endoscope and fully examine the colon wall. Because of this, detection rates can vary between different gastroenterologists and technology have been proposed as solutions to assist disease detection and standardise detection rates. This thesis focuses on developing artificial intelligence algorithms to assist gastroenterologists during colonoscopy with the potential to ensure a baseline standard of quality in CRC screening. To achieve such assistance, the technical contributions develop deep learning methods and architectures for automated endoscopic image analysis to address both the detection of lesions in the endoscopic image and the 3D mapping of the endoluminal environment. The proposed detection models can run in real-time and assist visualization of different polyp types. Meanwhile the 3D reconstruction and mapping models developed are the basis for ensuring that the entire colon has been examined appropriately and to support quantitative measurement of polyp sizes using the image during a procedure. Results and validation studies presented within the thesis demonstrate how the developed algorithms perform on both general scenes and on clinical data. The feasibility of clinical translation is demonstrated for all of the models on endoscopic data from human participants during CRC screening examinations

    A virtual object point model for the calibration of underwater stereo cameras to recover accurate 3D information

    Get PDF
    The focus of this thesis is on recovering accurate 3D information from underwater images. Underwater 3D reconstruction differs significantly from 3D reconstruction in air due to the refraction of light. In this thesis, the concepts of stereo 3D reconstruction in air get extended for underwater environments by an explicit consideration of refractive effects with the aid of a virtual object point model. Within underwater stereo 3D reconstruction, the focus of this thesis is on the refractive calibration of underwater stereo cameras

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Optical Imaging and Image Restoration Techniques for Deep Ocean Mapping: A Comprehensive Survey

    Get PDF
    Visual systems are receiving increasing attention in underwater applications. While the photogrammetric and computer vision literature so far has largely targeted shallow water applications, recently also deep sea mapping research has come into focus. The majority of the seafloor, and of Earth’s surface, is located in the deep ocean below 200 m depth, and is still largely uncharted. Here, on top of general image quality degradation caused by water absorption and scattering, additional artificial illumination of the survey areas is mandatory that otherwise reside in permanent darkness as no sunlight reaches so deep. This creates unintended non-uniform lighting patterns in the images and non-isotropic scattering effects close to the camera. If not compensated properly, such effects dominate seafloor mosaics and can obscure the actual seafloor structures. Moreover, cameras must be protected from the high water pressure, e.g. by housings with thick glass ports, which can lead to refractive distortions in images. Additionally, no satellite navigation is available to support localization. All these issues render deep sea visual mapping a challenging task and most of the developed methods and strategies cannot be directly transferred to the seafloor in several kilometers depth. In this survey we provide a state of the art review of deep ocean mapping, starting from existing systems and challenges, discussing shallow and deep water models and corresponding solutions. Finally, we identify open issues for future lines of research

    Label Efficient 3D Scene Understanding

    Get PDF
    3D scene understanding models are becoming increasingly integrated into modern society. With applications ranging from autonomous driving, Augmented Real- ity, Virtual Reality, robotics and mapping, the demand for well-behaved models is rapidly increasing. A key requirement for training modern 3D models is high- quality manually labelled training data. Collecting training data is often the time and monetary bottleneck, limiting the size of datasets. As modern data-driven neu- ral networks require very large datasets to achieve good generalisation, finding al- ternative strategies to manual labelling is sought after for many industries. In this thesis, we present a comprehensive study on achieving 3D scene under- standing with fewer labels. Specifically, we evaluate 4 approaches: existing data, synthetic data, weakly-supervised and self-supervised. Existing data looks at the potential of using readily available national mapping data as coarse labels for train- ing a building segmentation model. We further introduce an energy-based active contour snake algorithm to improve label quality by utilising co-registered LiDAR data. This is attractive as whilst the models may still require manual labels, these labels already exist. Synthetic data also exploits already existing data which was not originally designed for training neural networks. We demonstrate a pipeline for generating a synthetic Mobile Laser Scanner dataset. We experimentally evalu- ate if such a synthetic dataset can be used to pre-train smaller real-world datasets, increasing the generalisation with less data. A weakly-supervised approach is presented which allows for competitive per- formance on challenging real-world benchmark 3D scene understanding datasets with up to 95% less data. We propose a novel learning approach where the loss function is learnt. Our key insight is that the loss function is a local function and therefore can be trained with less data on a simpler task. Once trained our loss function can be used to train a 3D object detector using only unlabelled scenes. Our method is both flexible and very scalable, even performing well across datasets. Finally, we propose a method which only requires a single geometric represen- tation of each object class as supervision for 3D monocular object detection. We discuss why typical L2-like losses do not work for 3D object detection when us- ing differentiable renderer-based optimisation. We show that the undesirable local- minimas that the L2-like losses fall into can be avoided with the inclusion of a Generative Adversarial Network-like loss. We achieve state-of-the-art performance on the challenging 6DoF LineMOD dataset, without any scene level labels
    • …
    corecore