Search CORE

258 research outputs found

Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image

Author: Deng Yu
Shum Heung-Yeung
Wang Baoyuan
Publication venue
Publication date: 25/11/2022
Field of study

A key challenge for novel view synthesis of monocular portrait images is 3D consistency under continuous pose variations. Most existing methods rely on 2D generative models which often leads to obvious 3D inconsistency artifacts. We present a 3D-consistent novel view synthesis approach for monocular portrait images based on a recent proposed 3D-aware GAN, namely Generative Radiance Manifolds (GRAM), which has shown strong 3D consistency at multiview image generation of virtual subjects via the radiance manifolds representation. However, simply learning an encoder to map a real image into the latent space of GRAM can only reconstruct coarse radiance manifolds without faithful fine details, while improving the reconstruction fidelity via instance-specific optimization is time-consuming. We introduce a novel detail manifolds reconstructor to learn 3D-consistent fine details on the radiance manifolds from monocular images, and combine them with the coarse radiance manifolds for high-fidelity reconstruction. The 3D priors derived from the coarse radiance manifolds are used to regulate the learned details to ensure reasonable synthesized results at novel views. Trained on in-the-wild 2D images, our method achieves high-fidelity and 3D-consistent portrait synthesis largely outperforming the prior art.Comment: Project page: https://yudeng.github.io/GRAMInverter

arXiv.org e-Print Archive

Image super-resolution using gradient profile prior

Author: Heung-yeung Shum
Jian Sun
Jian Sun
Zongben Xu
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we propose an image super-resolution approach using a novel generic image prior – gradient profile prior, which is a parametric prior describing the shape and the sharpness of the image gradients. Using the gradient profile prior learned from a large number of natural images, we can provide a constraint on image gradients when we estimate a hi-resolution image from a low-resolution image. With this simple but very effective prior, we are able to produce state-of-the-art results. The reconstructed hiresolution image is sharp while has rare ringing or jaggy artifacts

CiteSeerX

Reinforced Disentanglement for Face Swapping without Skip Connection

Author: Chen Xingyu
Ren Xiaohang
Shum Heung-Yeung
Wang Baoyuan
Yao Pengfei
Publication venue
Publication date: 03/08/2023
Field of study

The SOTA face swap models still suffer the problem of either target identity (i.e., shape) being leaked or the target non-identity attributes (i.e., background, hair) failing to be fully preserved in the final results. We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. To fix them, we introduce a new face swap framework called 'WSC-swap' that gets rid of skip connections and uses two target encoders to respectively capture the pixel-level non-facial region attributes and the semantic non-identity attributes in the face region. To further reinforce the disentanglement learning for the target encoder, we employ both identity removal loss via adversarial training (i.e., GAN) and the non-identity preservation loss via prior 3DMM models like [11]. Extensive experiments on both FaceForensics++ and CelebA-HQ show that our results significantly outperform previous works on a rich set of metrics, including one novel metric for measuring identity consistency that was completely neglected before.Comment: Accepted by ICCV 202

arXiv.org e-Print Archive

Real-time smoke rendering using compensated ray marching

Author: Baining Guo
Heung-yeung Shum
Hujun Bao
Kun Zhou
Stephen Lin
Zhong Ren
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

We present a real-time algorithm called compensated ray march-ing for rendering of smoke under dynamic low-frequency environ-ment lighting. Our approach is based on a decomposition of the input smoke animation, represented as a sequence of volumetric density fields, into a set of radial basis functions (RBFs) and a se-quence of residual fields. To expedite rendering, the source radi-ance distribution within the smoke is computed from only the low-frequency RBF approximation of the density fields, since the high-frequency residuals have little impact on global illumination under low-frequency environment lighting. Furthermore, in computing source radiances the contributions from single and multiple scatter-ing are evaluated at only the RBF centers and then approximated at other points in the volume using an RBF-based interpolation. A slice-based integration of these source radiances along each view ray is then performed to render the final image. The high-frequency residual fields, which are a critical component in the local appear-ance of smoke, are compensated back into the radiance integral dur-ing this ray march to generate images of high detail. The runtime algorithm, which includes both light transfer simula-tion and ray marching, can be easily implemented on the GPU, and thus allows for real-time manipulation of viewpoint and lighting, as well as interactive editing of smoke attributes such as extinction cross section, scattering albedo, and phase function. Only moderate preprocessing time and storage is needed. This approach provides the first method for real-time smoke rendering that includes sin-gle and multiple scattering while generating results comparable in quality to offline algorithms like ray tracing

CiteSeerX

Crossref

Real-Time Bayesian 3-D Pose Tracking

Author: Heung-Yeung Shum
Qiang Wang
Weiwei Zhang
Xiaoou Tang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Author: Li Feng
Liu Shilong
Ni Lionel M.
Shum Heung-Yeung
xu Huaizhe
Zhang Hao
Zhang Lei
Publication venue
Publication date: 12/12/2022
Field of study

In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, and scalable, and it can benefit from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing specialized segmentation methods, both on a ResNet-50 backbone and a pre-trained model with SwinL backbone. Notably, Mask DINO establishes the best results to date on instance segmentation (54.5 AP on COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation (60.8 mIoU on ADE20K) among models under one billion parameters. Code is available at \url{https://github.com/IDEACVR/MaskDINO}

arXiv.org e-Print Archive