56 research outputs found
Fast Preprocessing for Robust Face Sketch Synthesis
Exemplar-based face sketch synthesis methods usually meet the challenging
problem that input photos are captured in different lighting conditions from
training photos. The critical step causing the failure is the search of similar
patch candidates for an input photo patch. Conventional illumination invariant
patch distances are adopted rather than directly relying on pixel intensity
difference, but they will fail when local contrast within a patch changes. In
this paper, we propose a fast preprocessing method named Bidirectional
Luminance Remapping (BLR), which interactively adjust the lighting of training
and input photos. Our method can be directly integrated into state-of-the-art
exemplar-based methods to improve their robustness with ignorable computational
cost.Comment: IJCAI 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sketch/index.htm
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation
We address the problem of semi-supervised video object segmentation (VOS),
where the masks of objects of interests are given in the first frame of an
input video. To deal with challenging cases where objects are occluded or
missing, previous work relies on greedy data association strategies that make
decisions for each frame individually. In this paper, we propose a novel
approach to defer the decision making for a target object in each frame, until
a global view can be established with the entire video being taken into
consideration. Our approach is in the same spirit as Multiple Hypotheses
Tracking (MHT) methods, making several critical adaptations for the VOS
problem. We employ the bounding box (bbox) hypothesis for tracking tree
formation, and the multiple hypotheses are spawned by propagating the preceding
bbox into the detected bbox proposals within a gated region starting from the
initial object mask in the first frame. The gated region is determined by a
gating scheme which takes into account a more comprehensive motion model rather
than the simple Kalman filtering model in traditional MHT. To further design
more customized algorithms tailored for VOS, we develop a novel mask
propagation score instead of the appearance similarity score that could be
brittle due to large deformations. The mask propagation score, together with
the motion score, determines the affinity between the hypotheses during tree
pruning. Finally, a novel mask merging strategy is employed to handle mask
conflicts between objects. Extensive experiments on challenging datasets
demonstrate the effectiveness of the proposed method, especially in the case of
object missing.Comment: accepted to CVPR 2019 as oral presentatio
Learning to Hallucinate Face Images via Component Generation and Enhancement
We propose a two-stage method for face hallucination. First, we generate
facial components of the input image using CNNs. These components represent the
basic facial structures. Second, we synthesize fine-grained facial structures
from high resolution training images. The details of these structures are
transferred into facial components for enhancement. Therefore, we generate
facial components to approximate ground truth global appearance in the first
stage and enhance them through recovering details in the second stage. The
experiments demonstrate that our method performs favorably against
state-of-the-art methodsComment: IJCAI 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sr/index.htm
Stylizing Face Images via Multiple Exemplars
We address the problem of transferring the style of a headshot photo to face
images. Existing methods using a single exemplar lead to inaccurate results
when the exemplar does not contain sufficient stylized facial components for a
given photo. In this work, we propose an algorithm to stylize face images using
multiple exemplars containing different subjects in the same style. Patch
correspondences between an input photo and multiple exemplars are established
using a Markov Random Field (MRF), which enables accurate local energy transfer
via Laplacian stacks. As image patches from multiple exemplars are used, the
boundaries of facial components on the target image are inevitably
inconsistent. The artifacts are removed by a post-processing step using an
edge-preserving filter. Experimental results show that the proposed algorithm
consistently produces visually pleasing results.Comment: In CVIU 2017. Project Page:
http://www.cs.cityu.edu.hk/~yibisong/cviu17/index.htm
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
We address the problem of video representation learning without
human-annotated labels. While previous efforts address the problem by designing
novel self-supervised tasks using video data, the learned features are merely
on a frame-by-frame basis, which are not applicable to many video analytic
tasks where spatio-temporal features are prevailing. In this paper we propose a
novel self-supervised approach to learn spatio-temporal features for video
representation. Inspired by the success of two-stream approaches in video
classification, we propose to learn visual features by regressing both motion
and appearance statistics along spatial and temporal dimensions, given only the
input video data. Specifically, we extract statistical concepts (fast-motion
region and the corresponding dominant direction, spatio-temporal color
diversity, dominant color, etc.) from simple patterns in both spatial and
temporal domains. Unlike prior puzzles that are even hard for humans to solve,
the proposed approach is consistent with human inherent visual habits and
therefore easy to answer. We conduct extensive experiments with C3D to validate
the effectiveness of our proposed approach. The experiments show that our
approach can significantly improve the performance of C3D when applied to video
classification tasks. Code is available at
https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
We present a large-scale facial UV-texture dataset that contains over 50,000
high-quality texture UV-maps with even illuminations, neutral expressions, and
cleaned facial regions, which are desired characteristics for rendering
realistic 3D face models under different lighting conditions. The dataset is
derived from a large-scale face image dataset namely FFHQ, with the help of our
fully automatic and robust UV-texture production pipeline. Our pipeline
utilizes the recent advances in StyleGAN-based facial image editing approaches
to generate multi-view normalized face images from single-image inputs. An
elaborated UV-texture extraction, correction, and completion procedure is then
applied to produce high-quality UV-maps from the normalized face images.
Compared with existing UV-texture datasets, our dataset has more diverse and
higher-quality texture maps. We further train a GAN-based texture decoder as
the nonlinear texture basis for parametric fitting based 3D face
reconstruction. Experiments show that our method improves the reconstruction
accuracy over state-of-the-art approaches, and more importantly, produces
high-quality texture maps that are ready for realistic renderings. The dataset,
code, and pre-trained texture decoder are publicly available at
https://github.com/csbhr/FFHQ-UV.Comment: The dataset, code, and pre-trained texture decoder are publicly
available at https://github.com/csbhr/FFHQ-U
MVF-Net: Multi-View 3D Face Morphable Model Regression
We address the problem of recovering the 3D geometry of a human face from a
set of facial images in multiple views. While recent studies have shown
impressive progress in 3D Morphable Model (3DMM) based facial reconstruction,
the settings are mostly restricted to a single view. There is an inherent
drawback in the single-view setting: the lack of reliable 3D constraints can
cause unresolvable ambiguities. We in this paper explore 3DMM-based shape
recovery in a different setting, where a set of multi-view facial images are
given as input. A novel approach is proposed to regress 3DMM parameters from
multi-view inputs with an end-to-end trainable Convolutional Neural Network
(CNN). Multiview geometric constraints are incorporated into the network by
establishing dense correspondences between different views leveraging a novel
self-supervised view alignment loss. The main ingredient of the view alignment
loss is a differentiable dense optical flow estimator that can backpropagate
the alignment errors between an input view and a synthetic rendering from
another input view, which is projected to the target view through the 3D shape
to be inferred. Through minimizing the view alignment loss, better 3D shapes
can be recovered such that the synthetic projections from one view to another
can better align with the observed image. Extensive experiments demonstrate the
superiority of the proposed method over other 3DMM methods.Comment: 2019 Conference on Computer Vision and Pattern Recognitio
Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar
Rendering photorealistic and dynamically moving human heads is crucial for
ensuring a pleasant and immersive experience in AR/VR and video conferencing
applications. However, existing methods often struggle to model challenging
facial regions (e.g., mouth interior, eyes, hair/beard), resulting in
unrealistic and blurry results. In this paper, we propose {\fullname}
({\name}), a method that adopts the neural point representation as well as the
neural volume rendering process and discards the predefined connectivity and
hard correspondence imposed by mesh-based approaches. Specifically, the neural
points are strategically constrained around the surface of the target
expression via a high-resolution UV displacement map, achieving increased
modeling capacity and more accurate control. We introduce three technical
innovations to improve the rendering and training efficiency: a patch-wise
depth-guided (shading point) sampling strategy, a lightweight radiance decoding
process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By
design, our {\name} is better equipped to handle topologically changing regions
and thin structures while also ensuring accurate expression control when
animating avatars. Experiments conducted on three subjects from the Multiface
dataset demonstrate the effectiveness of our designs, outperforming previous
state-of-the-art methods, especially in handling challenging facial regions
- …