12 research outputs found
3D Geometry Reconstruction from Discrete Volumetric Data
PĹevod diskrĂŠtnĂch volumetrickĂ˝ch dat na jejich povrchovou reprezentaci je dnes relativnÄ bÄĹžnou operacĂ. StandardnĂm ĹeĹĄenĂm tohoto problĂŠmu je uĹžitĂ algoritmu Marching cubes, kterĂ˝ aÄkoli je jednoduchĂ˝ a robustnĂ, produkuje nekvalitnĂ vĂ˝stup, kterĂ˝ vyĹžaduje nĂĄslednĂ˝ post-procesing. Tato diplomovĂĄ prĂĄce se zabĂ˝vĂĄ studiem alternativnĂch algoritmĹŻ pro extrakci izoploch z objemovĂ˝ch dat. ÄtenĂĄĹ bude srozumÄn s fundamenty tĂŠto problematiky a s principy metody Hierarchical Iso-Surface Extraction, jejĂĹž nezĂĄvislĂĄ implementace byla v rĂĄmci tĂŠto prĂĄce provedena a testovĂĄna.Conversion of discrete volumetric data to boundary representation is quite common operation. Standard approach to resolve this problem is to use well-known Marching cubes algorithm, which although simple and robust, generates low-quality output that requires subsequent post-processing. This master's thesis deals with uncommon algorithms used for isosurface extraction from volumes. The reader will be acquainted with fundamental principles of Hierarchical Iso-Surface Extraction method, that was independently implemented and tested in this work.
Single-view 3d body and cloth reconstruction under complex poses
Recent advances in 3D human shape reconstruction from single images have shown impressive results, leveraging on deep networks that model the so-called implicit function to learn the occupancy status of arbitrarily dense 3D points in space. However, while current algorithms based on this paradigm, like PiFuHD (Saito et al., 2020), are able to estimate accurate geometry of the human shape and clothes, they require high-resolution input images and are not able to capture complex body poses. Most training and evaluation is performed on 1k-resolution images of humans standing in front of the camera under neutral body poses. In this paper, we leverage publicly available data to extend existing implicit function-based models to deal with images of humans that can have arbitrary poses and self-occluded limbs. We argue that the representation power of the implicit function is not sufficient to simultaneously model details of the geometry and of the body pose. We, therefore, propose a coarse- to-fine approach in which we first learn an implicit function that maps the input image to a 3D body shape with a low level of detail, but which correctly fits the underlying human pose, despite its complexity. We then learn a displacement map, conditioned on the smoothed surface and on the input image, which encodes the high-frequency details of the clothes and body. In the experimental section, we show that this coarse-to-fine strategy represents a very good trade-off between shape detail and pose correctness, comparing favorably to the most recent state-of-the-art approaches. Our code will be made publicly available.This work is supported by the Spanish government with the projects MoHuCo PID2020-120049RB-I00 and MarĂa de Maeztu Seal of Excellence MDM-2016- 0656.Peer ReviewedPostprint (author's final draft
Multiple dataset visualization (MDV) framework for scalar volume data
Many applications require comparative analysis of multiple datasets representing different samples, conditions, time instants, or views in order to develop a better understanding of the scientific problem/system under consideration. One effective approach for such analysis is visualization of the data. In this PhD thesis, we propose an innovative multiple dataset visualization (MDV) approach in which two or more datasets of a given type are rendered concurrently in the same visualization. MDV is an important concept for the cases where it is not possible to make an inference based on one dataset, and comparisons between many datasets are required to reveal cross-correlations among them. The proposed MDV framework, which deals with some fundamental issues that arise when several datasets are visualized together, follows a multithreaded architecture consisting of three core components, data preparation/loading, visualization and rendering. The visualization module - the major focus of this study, currently deals with isosurface extraction and texture-based rendering techniques. For isosurface extraction, our all-in-memory approach keeps datasets under consideration and the corresponding geometric data in the memory. Alternatively, the only-polygons- or points-in-memory only keeps the geometric data in memory. To address the issues related to storage and computation, we develop adaptive data coherency and multiresolution schemes. The inter-dataset coherency scheme exploits the similarities among datasets to approximate the portions of isosurfaces of datasets using the isosurface of one or more reference datasets whereas the intra/inter-dataset multiresolution scheme processes the selected portions of each data volume at varying levels of resolution. The graphics hardware-accelerated approaches adopted for MDV include volume clipping, isosurface extraction and volume rendering, which use 3D textures and advanced per fragment operations. With appropriate user-defined threshold criteria, we find that various MDV techniques maintain a linear time-N relationship, improve the geometry generation and rendering time, and increase the maximum N that can be handled (N: number of datasets). Finally, we justify the effectiveness and usefulness of the proposed MDV by visualizing 3D scalar data (representing electron density distributions in magnesium oxide and magnesium silicate) from parallel quantum mechanical simulation
Stability and Expressiveness of Deep Generative Models
In den letzten Jahren hat Deep Learning sowohl das maschinelle Lernen als auch die maschinelle Bildverarbeitung revolutioniert. Viele klassische Computer Vision-Aufgaben, wie z.B. die Objekterkennung und semantische Segmentierung, die traditionell sehr anspruchsvoll waren, kĂśnnen nun mit Hilfe von Ăźberwachten Deep Learning-Techniken gelĂśst werden. Ăberwachtes Lernen ist ein mächtiges Werkzeug, wenn annotierte Daten verfĂźgbar sind und die betrachtete Aufgabe eine eindeutige LĂśsung hat. Diese Bedingungen sind allerdings nicht immer erfĂźllt. Ein vielversprechender Ansatz ist in diesem Fall die generative Modellierung. Im Gegensatz zu rein diskriminativen Modellen kĂśnnen generative Modelle mit Unsicherheiten umgehen und leistungsfähige Modelle lernen, auch wenn keine annotierten Trainingsdaten verfĂźgbar sind. Obwohl aktuelle Ansätze zur generativen Modellierung vielversprechende Ergebnisse erzielen, beeinträchtigen zwei Aspekte ihre Expressivität: (i) Einige der erfolgreichsten Ansätze zur Modellierung von Bilddaten werden nicht mehr mit Hilfe von Optimierungsalgorithmen trainiert, sondern mit Algorithmen, deren Dynamik bisher nicht gut verstanden wurde. (ii) Generative Modelle sind oft durch den Speicherbedarf der Ausgaberepräsentation begrenzt. In dieser Arbeit gehen wir auf beide Probleme ein: Im ersten Teil der Arbeit stellen wir eine Theorie vor, die es erlaubt, die Trainingsdynamik von Generative Adversarial Networks (GANs), einem der vielversprechendsten Ansätze zur generativen Modellierung, besser zu verstehen. Wir nähern uns dieser Problemstellung, indem wir minimale Beispielprobleme des GAN-Trainings vorstellen, die analytisch verstanden werden kĂśnnen. AnschlieĂend erhĂśhen wir schrittweise die Komplexität dieser Beispiele. Dadurch gewinnen wir neue Einblicke in die Trainingsdynamik von GANs und leiten neue Regularisierer her, die auch fĂźr allgemeine GANs sehr gut funktionieren. Insbesondere ermĂśglichen unsere neuen Regularisierer erstmals, ein GAN mit einer AuflĂśsung von einem Megapixel zu trainieren, ohne dass wir die AuflĂśsung der Trainingsverteilung schrittweise erhĂśhen mĂźssen. Im zweiten Teil dieser Arbeit betrachten wir Ausgaberepräsentationen fĂźr generative Modelle in 3D und fĂźr 3D-Rekonstruktionstechniken. Durch die EinfĂźhrung von impliziten Repräsentationen sind wir in der Lage, viele Techniken, die in 2D funktionieren, auf den 3D-Bereich auszudehnen ohne ihre Expressivität einzuschränken.In recent years, deep learning has revolutionized both machine learning and computer vision. Many classical computer vision tasks (e.g. object detection and semantic segmentation), which traditionally were very challenging, can now be solved using supervised deep learning techniques. While supervised learning is a powerful tool when labeled data is available and the task under consideration has a well-defined output, these conditions are not always satisfied. One promising approach in this case is given by generative modeling. In contrast to purely discriminative models, generative models can deal with uncertainty and learn powerful models even when labeled training data is not available. However, while current approaches to generative modeling achieve promising results, they suffer from two aspects that limit their expressiveness: (i) some of the most successful approaches to modeling image data are no longer trained using optimization algorithms, but instead employ algorithms whose dynamics are not well understood and (ii) generative models are often limited by the memory requirements of the output representation. We address both problems in this thesis: in the first part we introduce a theory which enables us to better understand the training dynamics of Generative Adversarial Networks (GANs), one of the most promising approaches to generative modeling. We tackle this problem by introducing minimal example problems of GAN training which can be understood analytically. Subsequently, we gradually increase the complexity of these examples. By doing so, we gain new insights into the training dynamics of GANs and derive new regularizers that also work well for general GANs. Our new regularizers enable us - for the first time - to train a GAN at one megapixel resolution without having to gradually increase the resolution of the training distribution. In the second part of this thesis we consider output representations in 3D for generative models and 3D reconstruction techniques. By introducing implicit representations to deep learning, we are able to extend many techniques that work in 2D to the 3D domain without sacrificing their expressiveness
Representação, visualização e manipulação de dados mÊdicos tridimensionais: um estudo sobre as bases da simulação cirúrgica imersiva
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico. Programa de PĂłs-Graduação em CiĂŞncia da Computação.Dados tridimensionais referentes a pacientes sĂŁo utilizados em diversos setores mĂŠdico-hospitalares, fornecendo embasamento Ă diagnĂłsticos e orientação durante procedimentos cirĂşrgicos. No entanto, apesar de bastante Ăşteis estes dados sĂŁo bastante inflexĂveis, nĂŁo permitindo que o usuĂĄrio interaja com estes ou os manipule. O emprego de tĂŠcnicas de computação grĂĄfica e realidade virtual para a representação destes dados sanaria estas dificuldades, gerando representaçþes indivĂduais e adaptadas para cada paciente e permitindo a realização de planejamentos cirĂşrgicos e cirurgias auxiliadas por computador, dentre outras possibilidades. A representação destes dados e as formas de manipulação devem conter um conjunto de elementos e obedecer alguns requisitos para que se obtenha realismo nas aplicaçþes, caso contrĂĄrio, o emprego destas tĂŠcnicas nĂŁo traria grandes vantagens. Analisando os elementos e requisitos a serem obedecidos, ĂŠ construĂdo um grafo de dependĂŞncias que mostra as tĂŠcnicas e estruturas computacionais necessĂĄrias para a obtenção de ambientes virtuais imersivos realistas. Tal grafo demonstra as estruturas de dados para representação de sĂłlidos como peça chave para este tipo de aplicativos. Para suprir as necessidades destes, ĂŠ apresentada uma estrutura de dado capaz de representar uma vasta classe de topologias espaciais, alĂŠm de permitir rĂĄpido acesso a elementos e suas vizinhanças, bem como mĂŠtodos para a construção de tal estrutura. Ă apresentada, tambĂŠm, uma aplicação para mensuração de artĂŠrias utilizando a estrutura e os mĂŠtodos previamente mencionados e os resultados por obtidos por estes
Neural Scene Representations for 3D Reconstruction and Generative Modeling
With the increasing technologization of society, we use machines for more and more complex tasks, ranging from driving assistance to video conferencing, to exploring planets. The scene representation, i.e., how sensory data is converted to compact descriptions of the environment, is a fundamental property for enabling the success but also the safety of such systems. A promising approach for developing robust, adaptive, and powerful scene representations are learning-based systems that can adapt themselves from observations. Indeed, deep learning has revolutionized computer vision in recent years. In particular, better model architectures, large amounts of training data, and more powerful computing devices enabled deep learning systems with unprecedented performance, and they now set the state-of-the-art in many benchmarks, ranging from image classification, to object detection, to semantic segmentation. Despite these successes, the way these systems operate is still fundamentally different from human cognition. In particular, most approaches operate in the 2D domain, while humans understand that images are projections of the three-dimensional world. In addition, they often do not follow a compositional understanding of scenes, which is fundamental to human reasoning. In this thesis, our goal is to develop scene representations that enable autonomous agents to navigate and act robustly and safely in complex environments while reasoning compositionally in 3D. To this end, we first propose a novel output representation for deep learning-based 3D reconstruction and generative modeling. We find that, compared to previous representations, our neural field-based approach does not require 3D space to be discretized achieving reconstructions at arbitrary resolution with a constant memory footprint. Next, we develop a differentiable rendering technique to infer these neural field-based 3D shape and texture representations from 2D observations and find that this allows us to scale to more complex, real-world scenarios. Subsequently, we combine our novel 3D shape representation with a spatially and temporally continuous vector field to model non-rigid shapes in motion. We observe that our novel 4D representation can be used for various discriminative and generative tasks, ranging from 4D reconstruction to 4D interpolation, to motion transfer. Finally, we develop an object-centric generative model that can generate 3D scenes in a compositional manner and that allows for photorealistic renderings of generated scenes. We find that our model not only improves image fidelity but also enables more controllable scene generation and image synthesis than prior work while training only from raw, unposed image collections
ABSTRACT
Figure 1: First three levels and final result of our hierarchical iso-surface extraction algorithm. In this paper we present a novel approach to iso-surface extraction which is based on a multiresolution volume data representation and hierarchically approximates the iso-surface with a semiregular mesh. After having generated a hierarchy of volumes, we extract the iso-surface from the coarsest resolution with a standard Marching Cubes algorithm, apply a simple mesh decimation strategy to improve the shape of the triangles, and use the result as a base mesh. Then we iteratively fit the mesh to the iso-surface at the finer volume levels, thereby subdividing it adaptively in order to be able to correctly reconstruct local features. We also take care of generating an even vertex distribution over the iso-surface so that the final result consists of triangles with good aspect ratio. The advantage of this approach as opposed to the standard method of extracting the iso-surface from the finest resolution with Marching Cubes is that it generates a mesh with subdivision connectivit