1,258 research outputs found

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    MORPH-DSLAM: Model Order Reduction for PHysics-based Deformable SLAM

    Get PDF
    We propose a new methodology to estimate the 3D displacement field of deformable objects from video sequences using standard monocular cameras. We solve in real time the complete (possibly visco-)hyperelasticity problem to properly describe the strain and stress fields that are consistent with the displacements captured by the images, constrained by real physics. We do not impose any ad-hoc prior or energy minimization in the external surface, since the real and complete mechanics problem is solved. This means that we can also estimate the internal state of the objects, even in occluded areas, just by observing the external surface and the knowledge of material properties and geometry. Solving this problem in real time using a realistic constitutive law, usually non-linear, is out of reach for current systems. To overcome this difficulty, we solve off-line a parametrized problem that considers each source of variability in the problem as a new parameter and, consequently, as a new dimension in the formulation. Model Order Reduction methods allow us to reduce the dimensionality of the problem, and therefore, its computational cost, while preserving the visualization of the solution in the high-dimensionality space. This allows an accurate estimation of the object deformations, improving also the robustness in the 3D points estimation

    Physics-Informed Computer Vision: A Review and Perspectives

    Full text link
    Incorporation of physical information in machine learning frameworks are opening and transforming many application domains. Here the learning process is augmented through the induction of fundamental knowledge and governing physical laws. In this work we explore their utility for computer vision tasks in interpreting and understanding visual data. We present a systematic literature review of formulation and approaches to computer vision tasks guided by physical laws. We begin by decomposing the popular computer vision pipeline into a taxonomy of stages and investigate approaches to incorporate governing physical equations in each stage. Existing approaches in each task are analyzed with regard to what governing physical processes are modeled, formulated and how they are incorporated, i.e. modify data (observation bias), modify networks (inductive bias), and modify losses (learning bias). The taxonomy offers a unified view of the application of the physics-informed capability, highlighting where physics-informed learning has been conducted and where the gaps and opportunities are. Finally, we highlight open problems and challenges to inform future research. While still in its early days, the study of physics-informed computer vision has the promise to develop better computer vision models that can improve physical plausibility, accuracy, data efficiency and generalization in increasingly realistic applications

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Full text link
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since--without additional prior assumptions--it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods--that handle arbitrary scenes and make only a few prior assumptions--and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.Comment: 25 page
    corecore