20 research outputs found

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Raw Depth Image Enhancement Using a Neural Network

    Get PDF
    The term image is often used to denote a data format that records information about a scene’s color. This dissertation object focuses on a similar format for recording distance information about a scene, “depth images”. Depth images have been used extensively in consumer-level applications, such as Apple’s Face ID, based on depth images for face recognition. However, depth images suffer from low precision and high errors, and some post-processing techniques need to be utilized to improve their quality. Deep learning, or neural networks, are frameworks that use a series of hierarchically arranged nonlinear networks to process input data. Although each layer of the network is limited in its capabilities, the learning capacity accumulated by the multilayer network becomes very powerful. This dissertation assembles two different deep learning frameworks to solve two different types of raw image preprocessing problems. The first network is the super-resolution network, a nonlinear interpolation of low-resolution deep images through the deep network to obtain high-resolution images. The second network is the inpainting network, which is used to mitigate the problem of losing specific pixel data in the original depth image for various reasons. This dissertation presents deep images processed by these two frameworks, and the quality of the processed images is significantly improved compared to the original images. The great potential of deep learning techniques in the field of deep image processing is shown

    Depth Assisted Background Modeling and Super-resolution of Depth Map

    Get PDF
    Background modeling is one of the fundamental tasks in the computer vision, which detects the foreground objects from the images. This is used in many applications such as object tracking, traffic analysis, scene understanding and other video applications. The easiest way to model the background is to obtain background image that does not include any moving objects. However, in some environment, the background may not be available and can be changed by the surrounding conditions like illumination changes (light switch on/off), object removed from the scene and objects with constant moving pattern (waving trees). The robustness and adaptation of the background are essential to this problem. Mixture of Gaussians (MOG) is one of the most widely used methods for background modeling using color information, whereas the depth map provides one more dimensional information of the images that is independent of the color. In this thesis, the color only based methods such as Gaussian Mixture Models (GMM), Hidden Markov Models (HMM), Kernel Density Estimation (KDE) are thoroughly reviewed firstly. Then the algorithm that jointly uses color and depth information is proposed, which uses MOG and single Gaussian model (SGM) to represent recent observations of the color and depth respectively. And the color-depth consistency check mechanism is also incorporated into the algorithm to improve the accuracy of the extracted background. The spatial resolution of the depth images captured from consumer depth camera is generally limited due to the element size of the senor. To overcome this limitation, depth image super-resolution is proposed to obtain the high resolution depth image from the low resolution depth image by making the inference on high frequency components. Deep convolution neural network has been widely successfully used in various computer vision tasks like image segmentation, classification and recognitions with remarkable performance. Recently, the residual network configuration has been proposed to further improve the performance. Inspired by this residual network, we redesign the popular deep model Super-Resolution Convolution Neural Network (SRCNN) for depth image super-resolution. Based on the idea of residual network and SRCNN structure, we proposed three neural network based approaches to address the problem of depth image super-resolution. In these approaches, we introduce the deconvolution layer into the network which enables the learning directly from original low resolution image to the desired high resolution image, instead of using conventional method like bicubic to interpolate the image before entering the network. Then in order to minimize the sharpness loss near the boundary regions, we add layers at the end of network to learn the residuals. The main contributions of this thesis are investigating the utilization of the depth information for background modeling and proposing three approaches on depth image super-resolution. For the first part, the property of depth image is exploited and added into the commonly used background models. By doing so, the background model can be constructed more efficiently and accurately because the depth information is not affected by the color information. During the investigation, we found that the depth image usually has two problems, which are spatial resolution and accuracy, which need to be addressed. Most of the depth images either have small resolution or the accuracy is very bad. In the second part of this thesis, we investigate three methods to obtain the accurate high resolution depth image from the low resolution one

    What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

    Full text link
    Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at \url{https://github.com/xzb030/IMG_Survey}

    Geometric Structure Extraction and Reconstruction

    Get PDF
    Geometric structure extraction and reconstruction is a long-standing problem in research communities including computer graphics, computer vision, and machine learning. Within different communities, it can be interpreted as different subproblems such as skeleton extraction from the point cloud, surface reconstruction from multi-view images, or manifold learning from high dimensional data. All these subproblems are building blocks of many modern applications, such as scene reconstruction for AR/VR, object recognition for robotic vision and structural analysis for big data. Despite its importance, the extraction and reconstruction of a geometric structure from real-world data are ill-posed, where the main challenges lie in the incompleteness, noise, and inconsistency of the raw input data. To address these challenges, three studies are conducted in this thesis: i) a new point set representation for shape completion, ii) a structure-aware data consolidation method, and iii) a data-driven deep learning technique for multi-view consistency. In addition to theoretical contributions, the algorithms we proposed significantly improve the performance of several state-of-the-art geometric structure extraction and reconstruction approaches, validated by extensive experimental results

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Autonomous Operation and Human-Robot Interaction on an Indoor Mobile Robot

    No full text
    MARVIN (Mobile Autonomous Robotic Vehicle for Indoor Navigation) was once the flagship of Victoria University’s mobile robotic fleet. However, over the years MARVIN has become obsolete. This thesis continues the the redevelopment of MARVIN, transforming it into a fully autonomous research platform for human-robot interaction (HRI). MARVIN utilises a Segway RMP, a self balancing mobility platform. This provides agile locomotion, but increases sensor processing complexity due to its dynamic pitch. MARVIN’s existing sensing systems (including a laser rangefinder and ultrasonic sensors) are augmented with tactile sensors and a Microsoft Kinect v2 RGB-D camera for 3D sensing. This allows the detection of the obstacles often found in MARVIN’s unmodified office-like operating environment. These sensors are processed using novel techniques to account for the Segway’s dynamic pitch. A newly developed navigation stack takes the processed sensor data to facilitate localisation, obstacle detection and motion planning. MARVIN’s inherited humanoid robotic torso is augmented with a touch screen and voice interface, enabling HRI. MARVIN’s HRI capabilities are demonstrated by implementing it as a robotic guide. This implementation is evaluated through a usability study and found to be successful. Through evaluations of MARVIN’s locomotion, sensing, localisation and motion planning systems, in addition to the usability study, MARVIN is found to be capable of both autonomous navigation and engaging HRI. These developed features open a diverse range of research directions and HRI tasks that MARVIN can be used to explore

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore