258 research outputs found
Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image
A key challenge for novel view synthesis of monocular portrait images is 3D
consistency under continuous pose variations. Most existing methods rely on 2D
generative models which often leads to obvious 3D inconsistency artifacts. We
present a 3D-consistent novel view synthesis approach for monocular portrait
images based on a recent proposed 3D-aware GAN, namely Generative Radiance
Manifolds (GRAM), which has shown strong 3D consistency at multiview image
generation of virtual subjects via the radiance manifolds representation.
However, simply learning an encoder to map a real image into the latent space
of GRAM can only reconstruct coarse radiance manifolds without faithful fine
details, while improving the reconstruction fidelity via instance-specific
optimization is time-consuming. We introduce a novel detail manifolds
reconstructor to learn 3D-consistent fine details on the radiance manifolds
from monocular images, and combine them with the coarse radiance manifolds for
high-fidelity reconstruction. The 3D priors derived from the coarse radiance
manifolds are used to regulate the learned details to ensure reasonable
synthesized results at novel views. Trained on in-the-wild 2D images, our
method achieves high-fidelity and 3D-consistent portrait synthesis largely
outperforming the prior art.Comment: Project page: https://yudeng.github.io/GRAMInverter
Image super-resolution using gradient profile prior
In this paper, we propose an image super-resolution approach using a novel generic image prior – gradient profile prior, which is a parametric prior describing the shape and the sharpness of the image gradients. Using the gradient profile prior learned from a large number of natural images, we can provide a constraint on image gradients when we estimate a hi-resolution image from a low-resolution image. With this simple but very effective prior, we are able to produce state-of-the-art results. The reconstructed hiresolution image is sharp while has rare ringing or jaggy artifacts
Reinforced Disentanglement for Face Swapping without Skip Connection
The SOTA face swap models still suffer the problem of either target identity
(i.e., shape) being leaked or the target non-identity attributes (i.e.,
background, hair) failing to be fully preserved in the final results. We show
that this insufficient disentanglement is caused by two flawed designs that
were commonly adopted in prior models: (1) counting on only one compressed
encoder to represent both the semantic-level non-identity facial
attributes(i.e., pose) and the pixel-level non-facial region details, which is
contradictory to satisfy at the same time; (2) highly relying on long
skip-connections between the encoder and the final generator, leaking a certain
amount of target face identity into the result. To fix them, we introduce a new
face swap framework called 'WSC-swap' that gets rid of skip connections and
uses two target encoders to respectively capture the pixel-level non-facial
region attributes and the semantic non-identity attributes in the face region.
To further reinforce the disentanglement learning for the target encoder, we
employ both identity removal loss via adversarial training (i.e., GAN) and the
non-identity preservation loss via prior 3DMM models like [11]. Extensive
experiments on both FaceForensics++ and CelebA-HQ show that our results
significantly outperform previous works on a rich set of metrics, including one
novel metric for measuring identity consistency that was completely neglected
before.Comment: Accepted by ICCV 202
Real-time smoke rendering using compensated ray marching
We present a real-time algorithm called compensated ray march-ing for rendering of smoke under dynamic low-frequency environ-ment lighting. Our approach is based on a decomposition of the input smoke animation, represented as a sequence of volumetric density fields, into a set of radial basis functions (RBFs) and a se-quence of residual fields. To expedite rendering, the source radi-ance distribution within the smoke is computed from only the low-frequency RBF approximation of the density fields, since the high-frequency residuals have little impact on global illumination under low-frequency environment lighting. Furthermore, in computing source radiances the contributions from single and multiple scatter-ing are evaluated at only the RBF centers and then approximated at other points in the volume using an RBF-based interpolation. A slice-based integration of these source radiances along each view ray is then performed to render the final image. The high-frequency residual fields, which are a critical component in the local appear-ance of smoke, are compensated back into the radiance integral dur-ing this ray march to generate images of high detail. The runtime algorithm, which includes both light transfer simula-tion and ray marching, can be easily implemented on the GPU, and thus allows for real-time manipulation of viewpoint and lighting, as well as interactive editing of smoke attributes such as extinction cross section, scattering albedo, and phase function. Only moderate preprocessing time and storage is needed. This approach provides the first method for real-time smoke rendering that includes sin-gle and multiple scattering while generating results comparable in quality to offline algorithms like ray tracing
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
In this paper we present Mask DINO, a unified object detection and
segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising
Anchor Boxes) by adding a mask prediction branch which supports all image
segmentation tasks (instance, panoptic, and semantic). It makes use of the
query embeddings from DINO to dot-product a high-resolution pixel embedding map
to predict a set of binary masks. Some key components in DINO are extended for
segmentation through a shared architecture and training process. Mask DINO is
simple, efficient, and scalable, and it can benefit from joint large-scale
detection and segmentation datasets. Our experiments show that Mask DINO
significantly outperforms all existing specialized segmentation methods, both
on a ResNet-50 backbone and a pre-trained model with SwinL backbone. Notably,
Mask DINO establishes the best results to date on instance segmentation (54.5
AP on COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation
(60.8 mIoU on ADE20K) among models under one billion parameters. Code is
available at \url{https://github.com/IDEACVR/MaskDINO}
- …