825 research outputs found
Text-Guided Neural Image Inpainting
Image inpainting task requires filling the corrupted image with contents
coherent with the context. This research field has achieved promising progress
by using neural image inpainting methods. Nevertheless, there is still a
critical challenge in guessing the missed content with only the context pixels.
The goal of this paper is to fill the semantic information in corrupted images
according to the provided descriptive text. Unique from existing text-guided
image generation works, the inpainting models are required to compare the
semantic content of the given text and the remaining part of the image, then
find out the semantic content that should be filled for missing part. To
fulfill such a task, we propose a novel inpainting model named Text-Guided Dual
Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention
mechanism is designed to extract the explicit semantic information about the
corrupted regions, which is done by comparing the descriptive text and
complementary image areas through reciprocal attention. Secondly, an image-text
matching loss is applied to maximize the semantic similarity of the generated
image and the text. Experiments are conducted on two open datasets. Results
show that the proposed TDANet model reaches new state-of-the-art on both
quantitative and qualitative measures. Result analysis suggests that the
generated images are consistent with the guidance text, enabling the generation
of various results by providing different descriptions. Codes are available at
https://github.com/idealwhite/TDANetComment: ACM MM'2020 (Oral). 9 pages, 4 tables, 7 figure
Geometry-Aware Face Completion and Editing
Face completion is a challenging generation task because it requires
generating visually pleasing new pixels that are semantically consistent with
the unmasked face region. This paper proposes a geometry-aware Face Completion
and Editing NETwork (FCENet) by systematically studying facial geometry from
the unmasked region. Firstly, a facial geometry estimator is learned to
estimate facial landmark heatmaps and parsing maps from the unmasked face
image. Then, an encoder-decoder structure generator serves to complete a face
image and disentangle its mask areas conditioned on both the masked face image
and the estimated facial geometry images. Besides, since low-rank property
exists in manually labeled masks, a low-rank regularization term is imposed on
the disentangled masks, enforcing our completion network to manage occlusion
area with various shape and size. Furthermore, our network can generate diverse
results from the same masked input by modifying estimated facial geometry,
which provides a flexible mean to edit the completed face appearance. Extensive
experimental results qualitatively and quantitatively demonstrate that our
network is able to generate visually pleasing face completion results and edit
face attributes as well
- …