72 research outputs found
Photo Editing with Face Selection and Replacement
This disclosure describes techniques to enable users to edit photos to include selected faces or facial expressions. A user can select a photo from a burst or other collection of photos. Detected faces in the selected photo are highlighted in a user interface that enables a user to select a face in the photo to modify. In response, a set of candidate faces that are suitable to replace the selected face are presented in the user interface. With user permission, the candidate faces can be obtained and/or modified from other accessible photos, such as from a burst of photos. The user can select any candidate face that seamlessly replaces the selected face in the displayed photo. The described interface allows users to quickly and easily replace undesired facial expressions in photos with preferred facial expressions
Prompt-to-Prompt Image Editing with Cross Attention Control
Recent large-scale text-driven synthesis models have attracted much attention
thanks to their remarkable capabilities of generating highly diverse images
that follow given text prompts. Such text-based synthesis methods are
particularly appealing to humans who are used to verbally describe their
intent. Therefore, it is only natural to extend the text-driven image synthesis
to text-driven image editing. Editing is challenging for these generative
models, since an innate property of an editing technique is to preserve most of
the original image, while in the text-based models, even a small modification
of the text prompt often leads to a completely different outcome.
State-of-the-art methods mitigate this by requiring the users to provide a
spatial mask to localize the edit, hence, ignoring the original structure and
content within the masked region. In this paper, we pursue an intuitive
prompt-to-prompt editing framework, where the edits are controlled by text
only. To this end, we analyze a text-conditioned model in depth and observe
that the cross-attention layers are the key to controlling the relation between
the spatial layout of the image to each word in the prompt. With this
observation, we present several applications which monitor the image synthesis
by editing the textual prompt only. This includes localized editing by
replacing a word, global editing by adding a specification, and even delicately
controlling the extent to which a word is reflected in the image. We present
our results over diverse images and prompts, demonstrating high-quality
synthesis and fidelity to the edited prompts
Optimized Image Resizing Using Seam Carving and Scaling
International audienceWe present a novel method for content-aware image resizing based on optimization of a well-defined image distance function, which preserves both the important regions and the global visual effect (the background or other decorative objects) of an image. The method operates by joint use of seam carving and image scaling. The principle behind our method is the use of a bidirectional similarity function of image Euclidean distance (IMED), while cooperating with a dominant color descriptor (DCD) similarity and seam energy variation. The function is suitable for the quantitative evaluation of the resizing result and the determination of the best seam carving number. ifferent from the previous simplex-modeapproaches, our method takes the advantages of both discrete and continuous methods. The technique is useful in image resizing for both reduction/retargeting and enlarging. We also show that this approach can be extended to indirect image resizing
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Personalization has emerged as a prominent aspect within the field of
generative AI, enabling the synthesis of individuals in diverse contexts and
styles, while retaining high-fidelity to their identities. However, the process
of personalization presents inherent challenges in terms of time and memory
requirements. Fine-tuning each personalized model needs considerable GPU time
investment, and storing a personalized model per subject can be demanding in
terms of storage capacity. To overcome these challenges, we propose
HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of
personalized weights from a single image of a person. By composing these
weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth
can generate a person's face in various contexts and styles, with high subject
details while also preserving the model's crucial knowledge of diverse styles
and semantic modifications. Our method achieves personalization on faces in
roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual
Inversion, using as few as one reference image, with the same quality and style
diversity as DreamBooth. Also our method yields a model that is 10000x smaller
than a normal DreamBooth model. Project page: https://hyperdreambooth.github.ioComment: project page: https://hyperdreambooth.github.i
Megastereo: Constructing High-Resolution Stereo Panoramas
Oral presentation (3.3% acceptance rate).International audienceWe present a solution for generating high-quality stereo panoramas at megapixel resolutions. While previous approaches introduced the basic principles, we show that those techniques do not generalise well to today's high image resolutions and lead to disturbing visual artefacts. As our first contribution, we describe the necessary correction steps and a compact representation for the input images in order to achieve a highly accurate approximation to the required ray space. Our second contribution is a flow-based upsampling of the available input rays which effectively resolves known aliasing issues like stitching artefacts. The required rays are generated on the fly to perfectly match the desired output resolution, even for small numbers of input images. In addition, the upsampling is real-time and enables direct interactive control over the desired stereoscopic depth effect. In combination, our contributions allow the generation of stereoscopic panoramas at high output resolutions that are virtually free of artefacts such as seams, stereo discontinuities, vertical parallax and other mono-/stereoscopic shape distortions. Our process is robust, and other types of multi-perspective panoramas, such as linear panoramas, can also benefit from our contributions. We show various comparisons and high-resolution results
RealFill: Reference-Driven Generation for Authentic Image Completion
Recent advances in generative imagery have brought forth outpainting and
inpainting models that can produce high-quality, plausible image content in
unknown regions, but the content these models hallucinate is necessarily
inauthentic, since the models lack sufficient context about the true scene. In
this work, we propose RealFill, a novel generative approach for image
completion that fills in missing regions of an image with the content that
should have been there. RealFill is a generative inpainting model that is
personalized using only a few reference images of a scene. These reference
images do not have to be aligned with the target image, and can be taken with
drastically varying viewpoints, lighting conditions, camera apertures, or image
styles. Once personalized, RealFill is able to complete a target image with
visually compelling contents that are faithful to the original scene. We
evaluate RealFill on a new image completion benchmark that covers a set of
diverse and challenging scenarios, and find that it outperforms existing
approaches by a large margin. See more results on our project page:
https://realfill.github.ioComment: Project page: https://realfill.github.i
- …