9 research outputs found
3D GANs and Latent Space: A comprehensive survey
Generative Adversarial Networks (GANs) have emerged as a significant player
in generative modeling by mapping lower-dimensional random noise to
higher-dimensional spaces. These networks have been used to generate
high-resolution images and 3D objects. The efficient modeling of 3D objects and
human faces is crucial in the development process of 3D graphical environments
such as games or simulations. 3D GANs are a new type of generative model used
for 3D reconstruction, point cloud reconstruction, and 3D semantic scene
completion. The choice of distribution for noise is critical as it represents
the latent space. Understanding a GAN's latent space is essential for
fine-tuning the generated samples, as demonstrated by the morphing of
semantically meaningful parts of images. In this work, we explore the latent
space and 3D GANs, examine several GAN variants and training methods to gain
insights into improving 3D GAN training, and suggest potential future
directions for further research
AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation
This paper targets on learning-based novel view synthesis from a single or
limited 2D images without the pose supervision. In the viewer-centered
coordinates, we construct an end-to-end trainable conditional variational
framework to disentangle the unsupervisely learned relative-pose/rotation and
implicit global 3D representation (shape, texture and the origin of
viewer-centered coordinates, etc.). The global appearance of the 3D object is
given by several appearance-describing images taken from any number of
viewpoints. Our spatial correlation module extracts a global 3D representation
from the appearance-describing images in a permutation invariant manner. Our
system can achieve implicitly 3D understanding without explicitly 3D
reconstruction. With an unsupervisely learned viewer-centered
relative-pose/rotation code, the decoder can hallucinate the novel view
continuously by sampling the relative-pose in a prior distribution. In various
applications, we demonstrate that our model can achieve comparable or even
better results than pose/3D model-supervised learning-based novel view
synthesis (NVS) methods with any number of input views.Comment: ECCV 202
Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique
In this dissertation we propose two novel image restoration schemes. The first pertains to automatic detection of damaged regions in old photographs and digital images of cracked paintings. In cases when inpainting mask generation cannot be completely automatic, our detection algorithm facilitates precise mask creation, particularly useful for images containing damage that is tedious to annotate or difficult to geometrically define. The main contribution of this dissertation is the development and utilization of a new inpainting technique, region hiding, to repair a single image by training a convolutional neural network on various transformations of that image. Region hiding is also effective in object removal tasks. Lastly, we present a segmentation system for distinguishing glands, stroma, and cells in slide images, in addition to current results, as one component of an ongoing project to aid in colon cancer prognostication
Video Understanding: A Predictive Analytics Perspective
This dissertation includes a detailed study of video predictive understanding, an emerging perspective on video-based computer vision research. This direction explores machine vision techniques to fill in missing spatiotemporal information in videos (e.g., predict the future), which is of great importance for understanding real world dynamics and benefits many applications. We investigate this direction with depth and breadth. Four emerging areas are considered and improved by our efforts: early action recognition, future activity prediction, trajectory prediction and procedure planning. For each, our research presents innovative solutions based on machine learning techniques (deep learning in particular) and meanwhile pays special attention to their interpretability, multi-modality and efficiency, which we consider as critical for next-generation Artificial Intelligence (AI). Finally, we conclude this dissertation by discussing current shortcomings as well as future directions
Video Understanding: A Predictive Analytics Perspective
This dissertation includes a detailed study of video predictive understanding, an emerging perspective on video-based computer vision research. This direction explores machine vision techniques to fill in missing spatiotemporal information in videos (e.g., predict the future), which is of great importance for understanding real world dynamics and benefits many applications. We investigate this direction with depth and breadth. Four emerging areas are considered and improved by our efforts: early action recognition, future activity prediction, trajectory prediction and procedure planning. For each, our research presents innovative solutions based on machine learning techniques (deep learning in particular) and meanwhile pays special attention to their interpretability, multi-modality and efficiency, which we consider as critical for next-generation Artificial Intelligence (AI). Finally, we conclude this dissertation by discussing current shortcomings as well as future directions