271 research outputs found
Geometry-Aware Face Completion and Editing
Face completion is a challenging generation task because it requires
generating visually pleasing new pixels that are semantically consistent with
the unmasked face region. This paper proposes a geometry-aware Face Completion
and Editing NETwork (FCENet) by systematically studying facial geometry from
the unmasked region. Firstly, a facial geometry estimator is learned to
estimate facial landmark heatmaps and parsing maps from the unmasked face
image. Then, an encoder-decoder structure generator serves to complete a face
image and disentangle its mask areas conditioned on both the masked face image
and the estimated facial geometry images. Besides, since low-rank property
exists in manually labeled masks, a low-rank regularization term is imposed on
the disentangled masks, enforcing our completion network to manage occlusion
area with various shape and size. Furthermore, our network can generate diverse
results from the same masked input by modifying estimated facial geometry,
which provides a flexible mean to edit the completed face appearance. Extensive
experimental results qualitatively and quantitatively demonstrate that our
network is able to generate visually pleasing face completion results and edit
face attributes as well
A survey of face recognition techniques under occlusion
The limited capacity to recognize faces under occlusions is a long-standing
problem that presents a unique challenge for face recognition systems and even
for humans. The problem regarding occlusion is less covered by research when
compared to other challenges such as pose variation, different expressions,
etc. Nevertheless, occluded face recognition is imperative to exploit the full
potential of face recognition for real-world applications. In this paper, we
restrict the scope to occluded face recognition. First, we explore what the
occlusion problem is and what inherent difficulties can arise. As a part of
this review, we introduce face detection under occlusion, a preliminary step in
face recognition. Second, we present how existing face recognition methods cope
with the occlusion problem and classify them into three categories, which are
1) occlusion robust feature extraction approaches, 2) occlusion aware face
recognition approaches, and 3) occlusion recovery based face recognition
approaches. Furthermore, we analyze the motivations, innovations, pros and
cons, and the performance of representative approaches for comparison. Finally,
future challenges and method trends of occluded face recognition are thoroughly
discussed
Cross domain Image Transformation and Generation by Deep Learning
Compared with single domain learning, cross-domain learning is more challenging due to the large domain variation. In addition, cross-domain image synthesis is more difficult than other cross learning problems, including, for example, correlation analysis, indexing, and retrieval, because it needs to learn complex function which contains image details for photo-realism. This work investigates cross-domain image synthesis in two common and challenging tasks, i.e., image-to-image and non-image-to-image transfer/synthesis.The image-to-image transfer is investigated in Chapter 2, where we develop a method for transformation between face images and sketch images while preserving the identity. Different from existing works that conduct domain transfer in a one-pass manner, we design a recurrent bidirectional transformation network (r-BTN), which allows bidirectional domain transfer in an integrated framework. More importantly, it could perceptually compose partial inputs from two domains to simultaneously synthesize face and sketch images with consistent identity. Most existing works could well synthesize images from patches that cover at least 70% of the original image. The proposed r-BTN could yield appealing results from patches that cover less than 10% because of the recursive estimation of the missing region in an incremental manner. Extensive experiments have been conducted to demonstrate the superior performance of r-BTN as compared to existing solutions.Chapter 3 targets at image transformation/synthesis from non-image sources, i.e., generating talking face based on the audio input. Existing works either do not consider temporal dependency thus yielding abrupt facial/lip movement or are limited to the generation for a specific person thus lacking generalization capacity. A novel conditional recurrent generation network which incorporates image and audio features in the recurrent unit for temporal dependency is proposed such that smooth transition can be achieved for lip and facial movements. To achieve image- and video-realism, we adopt a pair of spatial-temporal discriminators. Accurate lip synchronization is essential to the success of talking face video generation where we construct a lip-reading discriminator to boost the accuracy of lip synchronization. Extensive experiments demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition regarding lip and facial movement
- …