134 research outputs found

    Deep Shape-from-Template: Single-image quasi-isometric deformable registration and reconstruction

    Get PDF
    Shape-from-Template (SfT) solves 3D vision from a single image and a deformable 3D object model, called a template. Concretely, SfT computes registration (the correspondence between the template and the image) and reconstruction (the depth in camera frame). It constrains the object deformation to quasi-isometry. Real-time and automatic SfT represents an open problem for complex objects and imaging conditions. We present four contributions to address core unmet challenges to realise SfT with a Deep Neural Network (DNN). First, we propose a novel DNN called DeepSfT, which encodes the template in its weights and hence copes with highly complex templates. Second, we propose a semi-supervised training procedure to exploit real data. This is a practical solution to overcome the render gap that occurs when training only with simulated data. Third, we propose a geometry adaptation module to deal with different cameras at training and inference. Fourth, we combine statistical learning with physics-based reasoning. DeepSfT runs automatically and in real-time and we show with numerous experiments and an ablation study that it consistently achieves a lower 3D error than previous work. It outperforms in generalisation and achieves great performance in terms of reconstruction and registration error with wide-baseline, occlusions, illumination changes, weak texture and blur.Agencia Estatal de InvestigaciónMinisterio de Educació

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Full text link
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since--without additional prior assumptions--it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods--that handle arbitrary scenes and make only a few prior assumptions--and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.Comment: 25 page

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Get PDF
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular2D image observations is a long-standing and actively researched area ofcomputer vision and graphics. It is an ill-posed inverse problem,since--without additional prior assumptions--it permits infinitely manysolutions leading to accurate projection to the input 2D images. Non-rigidreconstruction is a foundational building block for downstream applicationslike robotics, AR/VR, or visual content creation. The key advantage of usingmonocular cameras is their omnipresence and availability to the end users aswell as their ease of use compared to more sophisticated camera set-ups such asstereo or multi-view systems. This survey focuses on state-of-the-art methodsfor dense non-rigid 3D reconstruction of various deformable objects andcomposite scenes from monocular videos or sets of monocular views. It reviewsthe fundamentals of 3D reconstruction and deformation modeling from 2D imageobservations. We then start from general methods--that handle arbitrary scenesand make only a few prior assumptions--and proceed towards techniques makingstronger assumptions about the observed objects and types of deformations (e.g.human faces, bodies, hands, and animals). A significant part of this STAR isalso devoted to classification and a high-level comparison of the methods, aswell as an overview of the datasets for training and evaluation of thediscussed techniques. We conclude by discussing open challenges in the fieldand the social aspects associated with the usage of the reviewed methods.<br

    Shape analysis and description based on the isometric invariances of topological skeletonization

    Get PDF
    ilustracionesIn this dissertation, we explore the problem of how to describe the shape of an object in 2D and 3D with a set of features that are invariant to isometric transformations. We focus to based our approach on the well-known Medial Axis Transform and its topological properties. We aim to study two problems. The first is how to find a shape representation of a segmented object that exhibits rotation, translation, and reflection invariance. The second problem is how to build a machine learning pipeline that uses the isometric invariance of the shape representation to do both classification and retrieval. Our proposed solution demonstrates competitive results compared to state-of-the-art approaches. We based our shape representation on the medial axis transform (MAT), sometimes called the topological skeleton. Accepted and well-studied properties of the medial axis include: homotopy preservation, rotation invariance, mediality, one pixel thickness, and the ability to fully reconstruct the object. These properties make the MAT a suitable input to create shape features; however, several problems arise because not all skeletonization methods satisfy all the above-mentioned properties at the same time. In general, skeletons based on thinning approaches preserve topology but are noise sensitive and do not allow a proper reconstruction. They are also not invariant to rotations. Voronoi skeletons also preserve topology and are rotation invariant, but do not have information about the thickness of the object, making reconstruction impossible. The Voronoi skeleton is an approximation of the real skeleton. The denser the sampling of the boundary, the better the approximation; however, a denser sampling makes the Voronoi diagram more computationally expensive. In contrast, distance transform methods allow the reconstruction of the original object by providing the distance from every pixel in the skeleton to the boundary. Moreover, they exhibit an acceptable degree of the properties listed above, but noise sensitivity remains an issue. Therefore, we selected distance transform medial axis methods as our skeletonization strategy, and focused on creating a new noise-free approach to solve the contour noise problem. To effectively classify an object, or perform any other task with features based on its shape, the descriptor needs to be a normalized, compact form: Φ\Phi should map every shape Ω\Omega to the same vector space Rn\mathrm{R}^{n}. This is not possible with skeletonization methods because the skeletons of different objects have different numbers of branches and different numbers of points, even when they belong to the same category. Consequently, we developed a strategy to extract features from the skeleton through the map Φ\Phi, which we used as an input to a machine learning approach. After developing our method for robust skeletonization, the next step is to use such skeleton into the machine learning pipeline to classify object into previously defined categories. We developed a set of skeletal features that were used as input data to the machine learning architectures. We ran experiments on MPEG7 and ModelNet40 dataset to test our approach in both 2D and 3D. Our experiments show results comparable with the state-of-the-art in shape classification and retrieval. Our experiments also show that our pipeline and our skeletal features exhibit some degree of invariance to isometric transformations. In this study, we sought to design an isometric invariant shape descriptor through robust skeletonization enforced by a feature extraction pipeline that exploits such invariance through a machine learning methodology. We conducted a set of classification and retrieval experiments over well-known benchmarks to validate our proposed method. (Tomado de la fuente)En esta disertación se explora el problema de cómo describir la forma de un objeto en 2D y 3D con un conjunto de características que sean invariantes a transformaciones isométricas. La metodología propuesta en este documento se enfoca en la Transformada del Eje Medio (Medial Axis Transform) y sus propiedades topológicas. Nuestro objetivo es estudiar dos problemas. El primero es encontrar una representación matemática de la forma de un objeto que exhiba invarianza a las operaciones de rotación, translación y reflexión. El segundo problema es como construir un modelo de machine learning que use esas invarianzas para las tareas de clasificación y consulta de objetos a través de su forma. El método propuesto en esta tesis muestra resultados competitivos en comparación con otros métodos del estado del arte. En este trabajo basamos nuestra representación de forma en la transformada del eje medio, a veces llamada esqueleto topológico. Algunas propiedades conocidas y bien estudiadas de la transformada del eje medio son: conservación de la homotopía, invarianza a la rotación, su grosor consiste en un solo pixel (1D), y la habilidad para reconstruir el objeto original a través de ella. Estas propiedades hacen de la transformada del eje medio un punto de partida adecuado para crear características de forma. Sin embargo, en este punto surgen varios problemas dado que no todos los métodos de esqueletización satisfacen, al mismo tiempo, todas las propiedades mencionadas anteriormente. En general, los esqueletos basados en enfoques de erosión morfológica conservan la topología del objeto, pero son sensibles al ruido y no permiten una reconstrucción adecuada. Además, no son invariantes a las rotaciones. Otro método de esqueletización son los esqueletos de Voronoi. Los esqueletos de Voronoi también conservan la topología y son invariantes a la rotación, pero no tienen información sobre el grosor del objeto, lo que hace imposible su reconstrucción. Cuanto más denso sea el muestreo del contorno del objeto, mejor será la aproximación. Sin embargo, un muestreo más denso hace que el diagrama de Voronoi sea más costoso computacionalmente. Por el contrario, los métodos basados en la transformada de la distancia permiten la reconstrucción del objeto original, ya que proporcionan la distancia desde cada píxel del esqueleto hasta su punto más cercano en el contorno. Además, exhiben un grado aceptable de las propiedades enumeradas anteriormente, aunque la sensibilidad al ruido sigue siendo un problema. Por lo tanto, en este documento seleccionamos los métodos basados en la transformada de la distancia como nuestra estrategia de esqueletización, y nos enfocamos en crear un nuevo enfoque que resuelva el problema del ruido en el contorno. Para clasificar eficazmente un objeto o realizar cualquier otra tarea con características basadas en su forma, el descriptor debe ser compacto y estar normalizado: Φ\Phi debe relacionar cada forma Ω\Omega al mismo espacio vectorial Rn\mathrm{R}^{n}. Esto no es posible con los métodos de esqueletización en el estado del arte, porque los esqueletos de diferentes objetos tienen diferentes números de ramas y diferentes números de puntos incluso cuando pertenecen a la misma categoría. Consecuentemente, en nuestra propuesta desarrollamos una estrategia para extraer características del esqueleto a través de la función Φ\Phi, que usamos como entrada para un enfoque de aprendizaje automático. % TODO completar con resultados. Después de desarrollar nuestro método de esqueletización robusta, el siguiente paso es usar dicho esqueleto en un modelo de aprendizaje de máquina para clasificar el objeto en categorías previamente definidas. Para ello se desarrolló un conjunto de características basadas en el eje medio que se utilizaron como datos de entrada para la arquitectura de aprendizaje automático. Realizamos experimentos en los conjuntos de datos: MPEG7 y ModelNet40 para probar nuestro enfoque tanto en 2D como en 3D. Nuestros experimentos muestran resultados comparables con el estado del arte en clasificación y consulta de formas (retrieval). Nuestros experimentos también muestran que el modelo desarrollado junto con nuestras características basadas en el eje medio son invariantes a las transformaciones isométricas. (Tomado de la fuente)Beca para Doctorados Nacionales de Colciencias, convocatoria 725 de 2015DoctoradoDoctor en IngenieríaVisión por computadora y aprendizaje automátic

    Bridging the gap between reconstruction and synthesis

    Get PDF
    Aplicat embargament des de la data de defensa fins el 15 de gener de 20223D reconstruction and image synthesis are two of the main pillars in computer vision. Early works focused on simple tasks such as multi-view reconstruction and texture synthesis. With the spur of Deep Learning, the field has rapidly progressed, making it possible to achieve more complex and high level tasks. For example, the 3D reconstruction results of traditional multi-view approaches are currently obtained with single view methods. Similarly, early pattern based texture synthesis works have resulted in techniques that allow generating novel high-resolution images. In this thesis we have developed a hierarchy of tools that cover all these range of problems, lying at the intersection of computer vision, graphics and machine learning. We tackle the problem of 3D reconstruction and synthesis in the wild. Importantly, we advocate for a paradigm in which not everything should be learned. Instead of applying Deep Learning naively we propose novel representations, layers and architectures that directly embed prior 3D geometric knowledge for the task of 3D reconstruction and synthesis. We apply these techniques to problems including scene/person reconstruction and photo-realistic rendering. We first address methods to reconstruct a scene and the clothed people in it while estimating the camera position. Then, we tackle image and video synthesis for clothed people in the wild. Finally, we bridge the gap between reconstruction and synthesis under the umbrella of a unique novel formulation. Extensive experiments conducted along this thesis show that the proposed techniques improve the performance of Deep Learning models in terms of the quality of the reconstructed 3D shapes / synthesised images, while reducing the amount of supervision and training data required to train them. In summary, we provide a variety of low, mid and high level algorithms that can be used to incorporate prior knowledge into different stages of the Deep Learning pipeline and improve performance in tasks of 3D reconstruction and image synthesis.La reconstrucció 3D i la síntesi d'imatges són dos dels pilars fonamentals en visió per computador. Els estudis previs es centren en tasques senzilles com la reconstrucció amb informació multi-càmera i la síntesi de textures. Amb l'aparició del "Deep Learning", aquest camp ha progressat ràpidament, fent possible assolir tasques molt més complexes. Per exemple, per obtenir una reconstrucció 3D, tradicionalment s'utilitzaven mètodes multi-càmera, en canvi ara, es poden obtenir a partir d'una sola imatge. De la mateixa manera, els primers treballs de síntesi de textures basats en patrons han donat lloc a tècniques que permeten generar noves imatges completes en alta resolució. En aquesta tesi, hem desenvolupat una sèrie d'eines que cobreixen tot aquest ventall de problemes, situats en la intersecció entre la visió per computador, els gràfics i l'aprenentatge automàtic. Abordem el problema de la reconstrucció i la síntesi 3D en el món real. És important destacar que defensem un paradigma on no tot s'ha d'aprendre. Enlloc d'aplicar el "Deep Learning" de forma naïve, proposem representacions novedoses i arquitectures que incorporen directament els coneixements geomètrics ja existents per a aconseguir la reconstrucció 3D i la síntesi d'imatges. Nosaltres apliquem aquestes tècniques a problemes com ara la reconstrucció d'escenes/persones i a la renderització d'imatges fotorealistes. Primer abordem els mètodes per reconstruir una escena, les persones vestides que hi ha i la posició de la càmera. A continuació, abordem la síntesi d'imatges i vídeos de persones vestides en situacions quotidianes. I finalment, aconseguim, a través d'una nova formulació única, connectar la reconstrucció amb la síntesi. Els experiments realitzats al llarg d'aquesta tesi demostren que les tècniques proposades milloren el rendiment dels models de "Deepp Learning" pel que fa a la qualitat de les reconstruccions i les imatges sintetitzades alhora que redueixen la quantitat de dades necessàries per entrenar-los. En resum, proporcionem una varietat d'algoritmes de baix, mitjà i alt nivell que es poden utilitzar per incorporar els coneixements previs a les diferents etapes del "Deep Learning" i millorar el rendiment en tasques de reconstrucció 3D i síntesi d'imatges.Postprint (published version

    Automatic segmentation of the human thigh muscles in magnetic resonance imaging

    Get PDF
    Advances in magnetic resonance imaging (MRI) and analysis techniques have improved diagnosis and patient treatment pathways. Typically, image analysis requires substantial technical and medical expertise and MR images can su↵er from artefacts, echo and intensity inhomogeneity due to gradient pulse eddy currents and inherent e↵ects of pulse radiation on MRI radio frequency (RF) coils that complicates the analysis. Processing and analysing serial sections of MRI scans to measure tissue volume is an additional challenge as the shapes and the borders between neighbouring tissues change significantly by anatomical location. Medical imaging solutions are needed to avoid laborious manual segmentation of specified regions of interest (ROI) and operator errors. The work set out in this thesis has addressed this challenge with a specific focus on skeletal muscle segmentation of the thigh. The aim was to develop an MRI segmentation framework for the quadriceps muscles, femur and bone marrow. Four contributions of this research include: (1) the development of a semi-automatic segmentation framework for a single transverse-plane image; (2) automatic segmentation of a single transverseplane image; (3) the automatic segmentation of multiple contiguous transverse-plane images from a full MRI thigh scan; and (4) the use of deep learning for MRI thigh quadriceps segmentation. Novel image processing, statistical analysis and machine learning algorithms were developed for all solutions and they were compared against current gold-standard manual segmentation. Frameworks (1) and (3) require minimal input from the user to delineate the muscle border. Overall, the frameworks in (1), (2) and (3) o↵er very good output performance, with respective framework’s mean segmentation accuracy by JSI and processing time of: (1) 0.95 and 17 sec; (2) 0.85 and 22 sec; and (3) 0.93 and 3 sec. For the framework in (4), the ImageNet trained model was customized by replacing the fully-connected layers in its architecture to convolutional layers (hence the name of Fully Convolutional Network (FCN)) and the pre-trained model was transferred for the ROI segmentation task. With the implementation of post-processing for image filtering and morphology to the segmented ROI, we have successfully accomplished a new benchmark for thigh MRI analysis. The mean accuracy and processing time with this framework are 0.9502 (by JSI ) and 0.117 sec per image, respectively

    On Motion Analysis in Computer Vision with Deep Learning: Selected Case Studies

    Get PDF
    Motion analysis is one of the essential enabling technologies in computer vision. Despite recent significant advances, image-based motion analysis remains a very challenging problem. This challenge arises because the motion features are extracted directory from a sequence of images without any other meta data information. Extracting motion information (features) is inherently more difficult than in other computer vision disciplines. In a traditional approach, the motion analysis is often formulated as an optimisation problem, with the motion model being hand-crafted to reflect our understanding of the problem domain. The critical element of these traditional methods is a prior assumption about the model of motion believed to represent a specific problem. Data analytics’ recent trend is to replace hand-crafted prior assumptions with a model learned directly from observational data with no, or very limited, prior assumptions about that model. Although known for a long time, these approaches, based on machine learning, have been shown competitive only very recently due to advances in the so-called deep learning methodologies. This work's key aim has been to investigate novel approaches, utilising the deep learning methodologies, for motion analysis where the motion model is learned directly from observed data. These new approaches have focused on investigating the deep network architectures suitable for the effective extraction of spatiotemporal information. Due to the estimated motion parameters' volume and structure, it is frequently difficult or even impossible to obtain relevant ground truth data. Missing ground truth leads to choose the unsupervised learning methodologies which is usually represents challenging choice to utilize in already challenging high dimensional motion representation of the image sequence. The main challenge with unsupervised learning is to evaluate if the algorithm can learn the data model directly from the data only without any prior knowledge presented to the deep learning model during In this project, an emphasis has been put on the unsupervised learning approaches. Owning to a broad spectrum of computer vision problems and applications related to motion analysis, the research reported in the thesis has focused on three specific motion analysis challenges and corresponding practical case studies. These include motion detection and recognition, as well as 2D and 3D motion field estimation. Eyeblinks quantification has been used as a case study for the motion detection and recognition problem. The approach proposed for this problem consists of a novel network architecture processing weakly corresponded images in an action completion regime with learned spatiotemporal image features fused using cascaded recurrent networks. The stereo-vision disparity estimation task has been selected as a case study for the 2D motion field estimation problem. The proposed method directly estimates occlusion maps using novel convolutional neural network architecture that is trained with a custom-designed loss function in an unsupervised manner. The volumetric data registration task has been chosen as a case study for the 3D motion field estimation problem. The proposed solution is based on the 3D CNN, with a novel architecture featuring a Generative Adversarial Network used during training to improve network performance for unseen data. All the proposed networks demonstrated a state-of-the-art performance compared to other corresponding methods reported in the literature on a number of assessment metrics. In particular, the proposed architecture for 3D motion field estimation has shown to outperform the previously reported manual expert-guided registration methodology

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    Data-driven shape analysis and processing

    Get PDF
    Data-driven methods serve an increasingly important role in discovering geometric, structural, and semantic relationships between shapes. In contrast to traditional approaches that process shapes in isolation of each other, data-driven methods aggregate information from 3D model collections to improve the analysis, modeling and editing of shapes. Through reviewing the literature, we provide an overview of the main concepts and components of these methods, as well as discuss their application to classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing
    corecore