118 research outputs found
Joint bilateral learning for real-time universal photorealistic style transfer
Photorealistic style transfer is the task of transferring the
artistic style of an image onto a content target, producing a result that
is plausibly taken with a camera. Recent approaches, based on deep
neural networks, produce impressive results but are either too slow to
run at practical resolutions, or still contain objectionable artifacts. We
propose a new end-to-end model for photorealistic style transfer that is
both fast and inherently generates photorealistic results. The core of our
approach is a feed-forward neural network that learns local edge-aware
a ne transforms that automatically obey the photorealism constraint.
When trained on a diverse set of images and a variety of styles, our
model can robustly apply style transfer to an arbitrary pair of input
images. Compared to the state of the art, our method produces visually
superior results and is three orders of magnitude faster, enabling real-
time performance at 4K on a mobile phone. We validate our method
with ablation and user studies.Published versio
Semantic Photo Manipulation with a Generative Image Prior
Despite the recent success of GANs in synthesizing images conditioned on
inputs such as a user sketch, text, or semantic labels, manipulating the
high-level attributes of an existing natural photograph with GANs is
challenging for two reasons. First, it is hard for GANs to precisely reproduce
an input image. Second, after manipulation, the newly synthesized pixels often
do not fit the original image. In this paper, we address these issues by
adapting the image prior learned by GANs to image statistics of an individual
image. Our method can accurately reconstruct the input image and synthesize new
content, consistent with the appearance of the input image. We demonstrate our
interactive system on several semantic image editing tasks, including
synthesizing new objects consistent with background, removing unwanted objects,
and changing the appearance of an object. Quantitative and qualitative
comparisons against several existing methods demonstrate the effectiveness of
our method.Comment: SIGGRAPH 201
Anycost GANs for Interactive Image Synthesis and Editing
Generative adversarial networks (GANs) have enabled photorealistic image
synthesis and editing. However, due to the high computational cost of
large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the
results of a single edit on edge devices, prohibiting interactive user
experience. In this paper, we take inspirations from modern rendering software
and propose Anycost GAN for interactive natural image editing. We train the
Anycost GAN to support elastic resolutions and channels for faster image
generation at versatile speeds. Running subsets of the full generator produce
outputs that are perceptually similar to the full generator, making them a good
proxy for preview. By using sampling-based multi-resolution training,
adaptive-channel training, and a generator-conditioned discriminator, the
anycost generator can be evaluated at various configurations while achieving
better image quality compared to separately trained models. Furthermore, we
develop new encoder training and latent code optimization techniques to
encourage consistency between the different sub-generators during image
projection. Anycost GAN can be executed at various cost budgets (up to 10x
computation reduction) and adapt to a wide range of hardware and latency
requirements. When deployed on desktop CPUs and edge devices, our model can
provide perceptually similar previews at 6-12x speedup, enabling interactive
image editing. The code and demo are publicly available:
https://github.com/mit-han-lab/anycost-gan.Comment: Accepted to CVPR 2021. The code and demo are available:
https://github.com/mit-han-lab/anycost-ga
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Recent years have witnessed the strong power of large text-to-image diffusion
models for the impressive generative capability to create high-fidelity images.
However, it is very tricky to generate desired images using only text prompt as
it often involves complex prompt engineering. An alternative to text prompt is
image prompt, as the saying goes: "an image is worth a thousand words".
Although existing methods of direct fine-tuning from pretrained models are
effective, they require large computing resources and are not compatible with
other base models, text prompt, and structural controls. In this paper, we
present IP-Adapter, an effective and lightweight adapter to achieve image
prompt capability for the pretrained text-to-image diffusion models. The key
design of our IP-Adapter is decoupled cross-attention mechanism that separates
cross-attention layers for text features and image features. Despite the
simplicity of our method, an IP-Adapter with only 22M parameters can achieve
comparable or even better performance to a fully fine-tuned image prompt model.
As we freeze the pretrained diffusion model, the proposed IP-Adapter can be
generalized not only to other custom models fine-tuned from the same base
model, but also to controllable generation using existing controllable tools.
With the benefit of the decoupled cross-attention strategy, the image prompt
can also work well with the text prompt to achieve multimodal image generation.
The project page is available at \url{https://ip-adapter.github.io}
- …