3 research outputs found
Deep Generative Model for Image Inpainting with Local Binary Pattern Learning and Spatial Attention
Deep learning (DL) has demonstrated its powerful capabilities in the field of
image inpainting. The DL-based image inpainting approaches can produce visually
plausible results, but often generate various unpleasant artifacts, especially
in the boundary and highly textured regions. To tackle this challenge, in this
work, we propose a new end-to-end, two-stage (coarse-to-fine) generative model
through combining a local binary pattern (LBP) learning network with an actual
inpainting network. Specifically, the first LBP learning network using U-Net
architecture is designed to accurately predict the structural information of
the missing region, which subsequently guides the second image inpainting
network for better filling the missing pixels. Furthermore, an improved spatial
attention mechanism is integrated in the image inpainting network, by
considering the consistency not only between the known region with the
generated one, but also within the generated region itself. Extensive
experiments on public datasets including CelebA-HQ, Places and Paris StreetView
demonstrate that our model generates better inpainting results than the
state-of-the-art competing algorithms, both quantitatively and qualitatively.
The source code and trained models will be made available at
https://github.com/HighwayWu/ImageInpainting
Grounded and Controllable Image Completion by Incorporating Lexical Semantics
In this paper, we present an approach, namely Lexical Semantic Image
Completion (LSIC), that may have potential applications in art, design, and
heritage conservation, among several others. Existing image completion
procedure is highly subjective by considering only visual context, which may
trigger unpredictable results which are plausible but not faithful to a
grounded knowledge. To permit both grounded and controllable completion
process, we advocate generating results faithful to both visual and lexical
semantic context, i.e., the description of leaving holes or blank regions in
the image (e.g., hole description). One major challenge for LSIC comes from
modeling and aligning the structure of visual-semantic context and translating
across different modalities. We term this process as structure completion,
which is realized by multi-grained reasoning blocks in our model. Another
challenge relates to the unimodal biases, which occurs when the model generates
plausible results without using the textual description. This can be true since
the annotated captions for an image are often semantically equivalent in
existing datasets, and thus there is only one paired text for a masked image in
training. We devise an unsupervised unpaired-creation learning path besides the
over-explored paired-reconstruction path, as well as a multi-stage training
strategy to mitigate the insufficiency of labeled data. We conduct extensive
quantitative and qualitative experiments as well as ablation studies, which
reveal the efficacy of our proposed LSIC.Comment: 9 pages, 9 figure
Virtual Codec Supervised Re-Sampling Network for Image Compression
In this paper, we propose an image re-sampling compression method by learning
virtual codec network (VCN) to resolve the non-differentiable problem of
quantization function for image compression. Here, the image re-sampling not
only refers to image full-resolution re-sampling but also low-resolution
re-sampling. We generalize this method for standard-compliant image compression
(SCIC) framework and deep neural networks based compression (DNNC) framework.
Specifically, an input image is measured by re-sampling network (RSN) network
to get re-sampled vectors. Then, these vectors are directly quantized in the
feature space in SCIC, or discrete cosine transform coefficients of these
vectors are quantized to further improve coding efficiency in DNNC. At the
encoder, the quantized vectors or coefficients are losslessly compressed by
arithmetic coding. At the receiver, the decoded vectors are utilized to restore
input image by image decoder network (IDN). In order to train RSN network and
IDN network together in an end-to-end fashion, our VCN network intimates
projection from the re-sampled vectors to the IDN-decoded image. As a result,
gradients from IDN network to RSN network can be approximated by VCN network's
gradient. Because dimension reduction can be further achieved by quantization
in some dimensional space after image re-sampling within auto-encoder
architecture, we can well initialize our networks from pre-trained auto-encoder
networks. Through extensive experiments and analysis, it is verified that the
proposed method has more effectiveness and versatility than many
state-of-the-art approaches.Comment: 13 pages, 11 figures Our project can be found in the website:
https://github.com/VirtualCodecNetwor