23 research outputs found

    Multiple View Texture Mapping: A Rendering Approach Designed for Driving Simulation

    Get PDF
    Simulation provides a safe and controlled environment ideal for human testing [49, 142, 120]. Simulation of real environments has reached new heights in terms of photo-realism. Often, a team of professional graphical artists would have to be hired to compete with modern commercial simulators. Meanwhile, machine vision methods are currently being developed that attempt to automatically provide geometrically consistent and photo-realistic 3D models of real scenes [189, 139, 115, 19, 140, 111, 132]. Often the only requirement is a set of images of that scene. A road engineer wishing to simulate the environment of a real road for driving experiments could potentially use these tools. This thesis develops a driving simulator that uses machine vision methods to reconstruct a real road automatically. A computer graphics method called projective texture mapping is applied to enhance the photo-realism of the 3D models[144, 43]. This essentially creates a virtual projector in the 3D environment to automatically assign image coordinates to a 3D model. These principles are demonstrated using custom shaders developed for an OpenGL rendering pipeline. Projective texture mapping presents a list of challenges to overcome, these include reverse projection and projection onto surfaces not immediately in front of the projector [53]. A significant challenge was the removal of dynamic foreground objects. 3D reconstruction systems create 3D models based on static objects captured in images. Dynamic objects are rarely reconstructed. Projective texture mapping of images, including these dynamic objects, can result in visual artefacts. A workflow is developed to resolve this, resulting in videos and 3D reconstructions of streets with no moving vehicles on the scene. The final simulator using 3D reconstruction and projective texture mapping is then developed. The rendering camera had a motion model introduced to enable human interaction. The final system is presented, experimentally tested, and future potential works are discussed

    Appearance modeling under geometric context for object recognition in videos

    Get PDF
    Object recognition is a very important high-level task in surveillance applications. This dissertation focuses on building appearance models for object recognition and exploring the relationship between shape and appearance for two key types of objects, human and vehicle. The dissertation proposes a generic framework that models the appearance while incorporating certain geometric prior information, or the so-called geometric context. Then under this framework, special methods are developed for recognizing humans and vehicles based on their appearance and shape attributes in surveillance videos. The first part of the dissertation presents a unified framework based on a general definition of geometric transform (GeT) which is applied to modeling object appearances under geometric context. The GeT models the appearance by applying designed functionals over certain geometric sets. GeT unifies Radon transform, trace transform, image warping etc. Moreover, five novel types of GeTs are introduced and applied to fingerprinting the appearance inside a contour. They include GeT based on level sets, GeT based on shape matching, GeT based on feature curves, GeT invariant to occlusion, and a multi-resolution GeT (MRGeT) that combines both shape and appearance information. The second part focuses on how to use the GeT to build appearance models for objects like walking humans, which have articulated motion of body parts. This part also illustrates the application of GeT for object recognition, image segmentation, video retrieval, and image synthesis. The proposed approach produces promising results when applied to automatic body part segmentation and fingerprinting the appearance of a human and body parts despite the presence of non-rigid deformations and articulated motion. It is very important to understand the 3D structure of vehicles in order to recognize them. To reconstruct the 3D model of a vehicle, the third part presents a factorization method for structure from planar motion. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration. Differences and the dual relationship between planar motion and planar object are also clarified in this part. Based on our method, a fully automated vehicle reconstruction system has been designed

    Appearance Modelling and Reconstruction for Navigation in Minimally Invasive Surgery

    Get PDF
    Minimally invasive surgery is playing an increasingly important role for patient care. Whilst its direct patient benefit in terms of reduced trauma, improved recovery and shortened hospitalisation has been well established, there is a sustained need for improved training of the existing procedures and the development of new smart instruments to tackle the issue of visualisation, ergonomic control, haptic and tactile feedback. For endoscopic intervention, the small field of view in the presence of a complex anatomy can easily introduce disorientation to the operator as the tortuous access pathway is not always easy to predict and control with standard endoscopes. Effective training through simulation devices, based on either virtual reality or mixed-reality simulators, can help to improve the spatial awareness, consistency and safety of these procedures. This thesis examines the use of endoscopic videos for both simulation and navigation purposes. More specifically, it addresses the challenging problem of how to build high-fidelity subject-specific simulation environments for improved training and skills assessment. Issues related to mesh parameterisation and texture blending are investigated. With the maturity of computer vision in terms of both 3D shape reconstruction and localisation and mapping, vision-based techniques have enjoyed significant interest in recent years for surgical navigation. The thesis also tackles the problem of how to use vision-based techniques for providing a detailed 3D map and dynamically expanded field of view to improve spatial awareness and avoid operator disorientation. The key advantage of this approach is that it does not require additional hardware, and thus introduces minimal interference to the existing surgical workflow. The derived 3D map can be effectively integrated with pre-operative data, allowing both global and local 3D navigation by taking into account tissue structural and appearance changes. Both simulation and laboratory-based experiments are conducted throughout this research to assess the practical value of the method proposed

    Deeply Learned Priors for Geometric Reconstruction

    Get PDF
    This thesis comprises of a body of work that investigates the use of deeply learned priors for dense geometric reconstruction of scenes. A typical image captured by a 2D camera sensor is a lossy two-dimensional (2D) projection of our three-dimensional (3D) world. Geometric reconstruction approaches usually recreate the lost structural information by taking in multiple images observing a scene from different views and solving a problem known as Structure from Motion (SfM) or Simultaneous Localization and Mapping (SLAM). Remarkably, by establishing correspondences across images and use of geometric models, these methods (under reasonable conditions) can reconstruct a scene's 3D structure as well as precisely localise the observed views relative to the scene. The success of dense every-pixel multi-view reconstruction is however limited by matching ambiguities that commonly arise due to uniform texture, occlusion, and appearance distortion, among several other factors. The standard approach to deal with matching ambiguities is to handcraft priors based on assumptions like piecewise smoothness or planarity in the 3D map, in order to "fill in" map regions supported by little or ambiguous matching evidence. In this thesis we propose learned priors that in comparison more closely model the true structure of the scene and are based on geometric information predicted from the images. The motivation stems from recent advancements in deep learning algorithms and availability of massive datasets, that have allowed Convolutional Neural Networks (CNNs) to predict geometric properties of a scene such as point-wise surface normals and depths, from just a single image, more reliably than what was possible using previous machine learning-based or hand-crafted methods. In particular, we first explore how single image-based surface normals from a CNN trained on massive amount of indoor data can benefit the accuracy of dense reconstruction given input images from a moving monocular camera. Here we propose a novel surface normal based inverse depth regularizer and compare its performance against the inverse depth smoothness prior that is typically used to regularize regions in the reconstruction that are textureless. We also propose the first real-time CNN-based framework for live dense monocular reconstruction using our learned normal prior. Next, we look at how we can use deep learning to learn features in order to improve the pixel matching process itself, which is at the heart of multi-view geometric reconstruction. We propose a self-supervised feature learning scheme using RGB-D data from a 3D sensor (that does not require any manual labelling) and a multi-scale CNN architecture for feature extraction that is fast and eficient to run inside our proposed real-time monocular reconstruction framework. We extensively analyze the combined benefits of using learned normals and deep features that are good-for-matching in the context of dense reconstruction, both quantitatively and qualitatively on large real world datasets. Lastly, we explore how learned depths, also predicted on a per-pixel basis from a single image using a CNN, can be used to inpaint sparse 3D maps obtained from monocular SLAM or a 3D sensor. We propose a novel model that uses predicted depths and confidences from CNNs as priors to inpaint maps with arbitrary scale and sparsity. We obtain more reliable reconstructions than those of traditional depth inpainting methods such as the cross-bilateral filter that in comparison offer few learnable parameters. Here we advocate the idea of "just-in-time reconstruction" where a higher level of scene understanding reliably inpaints the corresponding portion of a sparse map on-demand and in real-time.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Mobile Robots Navigation

    Get PDF
    Mobile robots navigation includes different interrelated activities: (i) perception, as obtaining and interpreting sensory information; (ii) exploration, as the strategy that guides the robot to select the next direction to go; (iii) mapping, involving the construction of a spatial representation by using the sensory information perceived; (iv) localization, as the strategy to estimate the robot position within the spatial map; (v) path planning, as the strategy to find a path towards a goal location being optimal or not; and (vi) path execution, where motor actions are determined and adapted to environmental changes. The book addresses those activities by integrating results from the research work of several authors all over the world. Research cases are documented in 32 chapters organized within 7 categories next described

    Matching hierarchical structures for shape recognition

    Get PDF
    In this thesis we aim to develop a framework for clustering trees and rep- resenting and learning a generative model of graph structures from a set of training samples. The approach is applied to the problem of the recognition and classification of shape abstracted in terms of its morphological skeleton. We make five contributions. The first is an algorithm to approximate tree edit-distance using relaxation labeling. The second is the introduction of the tree union, a representation capable of representing the modes of structural variation present in a set of trees. The third is an information theoretic approach to learning a generative model of tree structures from a training set. While the skeletal abstraction of shape was chosen mainly as a exper- imental vehicle, we, nonetheless, make some contributions to the fields of skeleton extraction and its graph representation. In particular, our fourth contribution is the development of a skeletonization method that corrects curvature effects in the Hamilton-Jacobi framework, improving its localiza- tion and noise sensitivity. Finally, we propose a shape-measure capable of characterizing shapes abstracted in terms of their skeleton. This measure has a number of interesting properties. In particular, it varies smoothly as the shape is deformed and can be easily computed using the presented skeleton extraction algorithm. Each chapter presents an experimental analysis of the proposed approaches applied to shape recognition problems

    3D shape instantiation for intra-operative navigation from a single 2D projection

    Get PDF
    Unlike traditional open surgery where surgeons can see the operation area clearly, in robot-assisted Minimally Invasive Surgery (MIS), a surgeon’s view of the region of interest is usually limited. Currently, 2D images from fluoroscopy, Magnetic Resonance Imaging (MRI), endoscopy or ultrasound are used for intra-operative guidance as real-time 3D volumetric acquisition is not always possible due to the acquisition speed or exposure constraints. 3D reconstruction, however, is key to navigation in complex in vivo geometries and can help resolve this issue. Novel 3D shape instantiation schemes are developed in this thesis, which can reconstruct the high-resolution 3D shape of a target from limited 2D views, especially a single 2D projection or slice. To achieve a complete and automatic 3D shape instantiation pipeline, segmentation schemes based on deep learning are also investigated. These include normalization schemes for training U-Nets and network architecture design of Atrous Convolutional Neural Networks (ACNNs). For U-Net normalization, four popular normalization methods are reviewed, then Instance-Layer Normalization (ILN) is proposed. It uses a sigmoid function to linearly weight the feature map after instance normalization and layer normalization, and cascades group normalization after the weighted feature map. Detailed validation results potentially demonstrate the practical advantages of the proposed ILN for effective and robust segmentation of different anatomies. For network architecture design in training Deep Convolutional Neural Networks (DCNNs), the newly proposed ACNN is compared to traditional U-Net where max-pooling and deconvolutional layers are essential. Only convolutional layers are used in the proposed ACNN with different atrous rates and it has been shown that the method is able to provide a fully-covered receptive field with a minimum number of atrous convolutional layers. ACNN enhances the robustness and generalizability of the analysis scheme by cascading multiple atrous blocks. Validation results have shown the proposed method achieves comparable results to the U-Net in terms of medical image segmentation, whilst reducing the trainable parameters, thus improving the convergence and real-time instantiation speed. For 3D shape instantiation of soft and deforming organs during MIS, Sparse Principle Component Analysis (SPCA) has been used to analyse a 3D Statistical Shape Model (SSM) and to determine the most informative scan plane. Synchronized 2D images are then scanned at the most informative scan plane and are expressed in a 2D SSM. Kernel Partial Least Square Regression (KPLSR) has been applied to learn the relationship between the 2D and 3D SSM. It has been shown that the KPLSR-learned model developed in this thesis is able to predict the intra-operative 3D target shape from a single 2D projection or slice, thus permitting real-time 3D navigation. Validation results have shown the intrinsic accuracy achieved and the potential clinical value of the technique. The proposed 3D shape instantiation scheme is further applied to intra-operative stent graft deployment for the robot-assisted treatment of aortic aneurysms. Mathematical modelling is first used to simulate the stent graft characteristics. This is then followed by the Robust Perspective-n-Point (RPnP) method to instantiate the 3D pose of fiducial markers of the graft. Here, Equally-weighted Focal U-Net is proposed with a cross-entropy and an additional focal loss function. Detailed validation has been performed on patient-specific stent grafts with an accuracy between 1-3mm. Finally, the relative merits and potential pitfalls of all the methods developed in this thesis are discussed, followed by potential future research directions and additional challenges that need to be tackled.Open Acces

    IMAGE CLASSIFICATION USING INVARIANT LOCAL FEATURES AND CONTEXTUAL INFORMATION

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Mathematical Problems in Rock Mechanics and Rock Engineering

    Get PDF
    With increasing requirements for energy, resources and space, rock engineering projects are being constructed more often and are operated in large-scale environments with complex geology. Meanwhile, rock failures and rock instabilities occur more frequently, and severely threaten the safety and stability of rock engineering projects. It is well-recognized that rock has multi-scale structures and involves multi-scale fracture processes. Meanwhile, rocks are commonly subjected simultaneously to complex static stress and strong dynamic disturbance, providing a hotbed for the occurrence of rock failures. In addition, there are many multi-physics coupling processes in a rock mass. It is still difficult to understand these rock mechanics and characterize rock behavior during complex stress conditions, multi-physics processes, and multi-scale changes. Therefore, our understanding of rock mechanics and the prevention and control of failure and instability in rock engineering needs to be furthered. The primary aim of this Special Issue “Mathematical Problems in Rock Mechanics and Rock Engineering” is to bring together original research discussing innovative efforts regarding in situ observations, laboratory experiments and theoretical, numerical, and big-data-based methods to overcome the mathematical problems related to rock mechanics and rock engineering. It includes 12 manuscripts that illustrate the valuable efforts for addressing mathematical problems in rock mechanics and rock engineering
    corecore