10,173 research outputs found
Weakly supervised 3D Reconstruction with Adversarial Constraint
Supervised 3D reconstruction has witnessed a significant progress through the
use of deep neural networks. However, this increase in performance requires
large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D
supervision as an alternative for expensive 3D CAD annotation. Specifically, we
use foreground masks as weak supervision through a raytrace pooling layer that
enables perspective projection and backpropagation. Additionally, since the 3D
reconstruction from masks is an ill posed problem, we propose to constrain the
3D reconstruction to the manifold of unlabeled realistic 3D shapes that match
mask observations. We demonstrate that learning a log-barrier solution to this
constrained optimization problem resembles the GAN objective, enabling the use
of existing tools for training GANs. We evaluate and analyze the manifold
constrained reconstruction on various datasets for single and multi-view
reconstruction of both synthetic and real images
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Recent text-to-3D methods employing diffusion models have made significant
advancements in 3D human generation. However, these approaches face challenges
due to the limitations of text-to-image diffusion models, which lack an
understanding of 3D structures. Consequently, these methods struggle to achieve
high-quality human generation, resulting in smooth geometry and cartoon-like
appearances. In this paper, we propose HumanNorm, a novel approach for
high-quality and realistic 3D human generation. The main idea is to enhance the
model's 2D perception of 3D geometry by learning a normal-adapted diffusion
model and a normal-aligned diffusion model. The normal-adapted diffusion model
can generate high-fidelity normal maps corresponding to user prompts with
view-dependent and body-aware text. The normal-aligned diffusion model learns
to generate color images aligned with the normal maps, thereby transforming
physical geometry details into realistic appearance. Leveraging the proposed
normal diffusion model, we devise a progressive geometry generation strategy
and a multi-step Score Distillation Sampling (SDS) loss to enhance the
performance of 3D human generation. Comprehensive experiments substantiate
HumanNorm's ability to generate 3D humans with intricate geometry and realistic
appearances. HumanNorm outperforms existing text-to-3D methods in both geometry
and texture quality. The project page of HumanNorm is
https://humannorm.github.io/.Comment: The project page of HumanNorm is https://humannorm.github.io
Adversarial Variational Embedding for Robust Semi-supervised Learning
Semi-supervised learning is sought for leveraging the unlabelled data when
labelled data is difficult or expensive to acquire. Deep generative models
(e.g., Variational Autoencoder (VAE)) and semisupervised Generative Adversarial
Networks (GANs) have recently shown promising performance in semi-supervised
classification for the excellent discriminative representing ability. However,
the latent code learned by the traditional VAE is not exclusive (repeatable)
for a specific input sample, which prevents it from excellent classification
performance. In particular, the learned latent representation depends on a
non-exclusive component which is stochastically sampled from the prior
distribution. Moreover, the semi-supervised GAN models generate data from
pre-defined distribution (e.g., Gaussian noises) which is independent of the
input data distribution and may obstruct the convergence and is difficult to
control the distribution of the generated data. To address the aforementioned
issues, we propose a novel Adversarial Variational Embedding (AVAE) framework
for robust and effective semi-supervised learning to leverage both the
advantage of GAN as a high quality generative model and VAE as a posterior
distribution learner. The proposed approach first produces an exclusive latent
code by the model which we call VAE++, and meanwhile, provides a meaningful
prior distribution for the generator of GAN. The proposed approach is evaluated
over four different real-world applications and we show that our method
outperforms the state-of-the-art models, which confirms that the combination of
VAE++ and GAN can provide significant improvements in semisupervised
classification.Comment: 9 pages, Accepted by Research Track in KDD 201
A Generative Model of People in Clothing
We present the first image-based generative model of people in clothing for
the full body. We sidestep the commonly used complex graphics rendering
pipeline and the need for high-quality 3D scans of dressed people. Instead, we
learn generative models from a large image database. The main challenge is to
cope with the high variance in human pose, shape and appearance. For this
reason, pure image-based approaches have not been considered so far. We show
that this challenge can be overcome by splitting the generating process in two
parts. First, we learn to generate a semantic segmentation of the body and
clothing. Second, we learn a conditional model on the resulting segments that
creates realistic images. The full model is differentiable and can be
conditioned on pose, shape or color. The result are samples of people in
different clothing items and styles. The proposed model can generate entirely
new people with realistic clothing. In several experiments we present
encouraging results that suggest an entirely data-driven approach to people
generation is possible
VideoForensicsHQ: Detecting High-quality Manipulated Face Videos
There are concerns that new approaches to the synthesis of high quality face
videos may be misused to manipulate videos with malicious intent. The research
community therefore developed methods for the detection of modified footage and
assembled benchmark datasets for this task. In this paper, we examine how the
performance of forgery detectors depends on the presence of artefacts that the
human eye can see. We introduce a new benchmark dataset for face video forgery
detection, of unprecedented quality. It allows us to demonstrate that existing
detection techniques have difficulties detecting fakes that reliably fool the
human eye. We thus introduce a new family of detectors that examine
combinations of spatial and temporal features and outperform existing
approaches both in terms of detection accuracy and generalization.Comment: ICME 2021 camera-read
- …