8 research outputs found

    3D pose estimation using convolutional neural networks

    Get PDF
    The present Master Thesis describes a new Pose Estimation method based on Convolutional Neural Networks (CNN). This method divides the three-dimensional space in several regions and, given an input image, returns the region where the camera is located. The first step is to create synthetic images of the object simulating a camera located at di鈫礶rent points around it. The CNN is pre-trained with these thousands of synthetic images of the object model. Then, we compute the pose of the object in hundreds of real images, and apply transfer learning with these labeled real images over the existing CNN, in order to refine the weights of the neurons and improve the network behaviour against real input images. Along with this deep learning approach, other techniques have been used trying to improve the quality of the results, such as the classical sliding window or a more recent class-generic object detector called objectness. It is tested with a 2D-model in order to ease the labeling process of the real images. This document outlines all the steps followed to create and test the method, and finally compares it against a state-of-the-art method at di鈫礶rent scales and levels of blurring

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucci贸 3D i la s铆ntesi d'imatges s贸n dos dels pilars fonamentals en visi贸 per computador. Els estudis previs es centren en tasques senzilles com la reconstrucci贸 amb informaci贸 multi-c脿mera i la s铆ntesi de textures. Amb l'aparici贸 del "Deep Learning", aquest camp ha progressat r脿pidament, fent possible assolir tasques molt m茅s complexes. Per exemple, per obtenir una reconstrucci贸 3D, tradicionalment s'utilitzaven m猫todes multi-c脿mera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de s铆ntesi de textures basats en patrons han donat lloc a t猫cniques que permeten generar noves imatges completes en alta resoluci贸. En aquesta tesi, hem desenvolupat una s猫rie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecci贸 entre la visi贸 per computador, els gr脿fics i l'aprenentatge autom脿tic. Abordem el problema de la reconstrucci贸 i la s铆ntesi 3D en el m贸n real. 脡s important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma na茂ve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geom猫trics ja existents per a aconseguir la reconstrucci贸 3D i la s铆ntesi d'imatges. Nosaltres apliquem aquestes t猫cniques a problemes com ara la reconstrucci贸 d'escenes/persones i a la renderitzaci贸 d'imatges fotorealistes. Primer abordem els m猫todes per reconstruir una escena, les persones vestides que hi ha i la posici贸 de la c脿mera. A continuaci贸, abordem la s铆ntesi d'imatges i v铆deos de persones vestides en situacions quotidianes. I finalment, aconseguim, a trav茅s d'una nova formulaci贸 煤nica, connectar la reconstrucci贸 amb la s铆ntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les t猫cniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necess脿ries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitj脿 i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucci贸 3D i s铆ntesi d'imatges.Postprint (published version

    The Synthesizability of Texture Examples

    No full text
    Dai D., Riemenschneider H., Van Gool L., ''The synthesizability of texture examples'', 27th IEEE conference on computer vision and pattern recognition - CVPR 2014, pp. 3027-3034, June 23-28, 2014, Columbus, Ohio, USA.status: publishe
    corecore