11,893 research outputs found
Semantic Photo Manipulation with a Generative Image Prior
Despite the recent success of GANs in synthesizing images conditioned on
inputs such as a user sketch, text, or semantic labels, manipulating the
high-level attributes of an existing natural photograph with GANs is
challenging for two reasons. First, it is hard for GANs to precisely reproduce
an input image. Second, after manipulation, the newly synthesized pixels often
do not fit the original image. In this paper, we address these issues by
adapting the image prior learned by GANs to image statistics of an individual
image. Our method can accurately reconstruct the input image and synthesize new
content, consistent with the appearance of the input image. We demonstrate our
interactive system on several semantic image editing tasks, including
synthesizing new objects consistent with background, removing unwanted objects,
and changing the appearance of an object. Quantitative and qualitative
comparisons against several existing methods demonstrate the effectiveness of
our method.Comment: SIGGRAPH 201
Calipso: Physics-based Image and Video Editing through CAD Model Proxies
We present Calipso, an interactive method for editing images and videos in a
physically-coherent manner. Our main idea is to realize physics-based
manipulations by running a full physics simulation on proxy geometries given by
non-rigidly aligned CAD models. Running these simulations allows us to apply
new, unseen forces to move or deform selected objects, change physical
parameters such as mass or elasticity, or even add entire new objects that
interact with the rest of the underlying scene. In Calipso, the user makes
edits directly in 3D; these edits are processed by the simulation and then
transfered to the target 2D content using shape-to-image correspondences in a
photo-realistic rendering process. To align the CAD models, we introduce an
efficient CAD-to-image alignment procedure that jointly minimizes for rigid and
non-rigid alignment while preserving the high-level structure of the input
shape. Moreover, the user can choose to exploit image flow to estimate scene
motion, producing coherent physical behavior with ambient dynamics. We
demonstrate Calipso's physics-based editing on a wide range of examples
producing myriad physical behavior while preserving geometric and visual
consistency.Comment: 11 page
Hierarchy Composition GAN for High-fidelity Image Synthesis
Despite the rapid progress of generative adversarial networks (GANs) in image
synthesis in recent years, the existing image synthesis approaches work in
either geometry domain or appearance domain alone which often introduces
various synthesis artifacts. This paper presents an innovative Hierarchical
Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and
appearance domains into an end-to-end trainable network and achieves superior
synthesis realism in both domains simultaneously. We design an innovative
hierarchical composition mechanism that is capable of learning realistic
composition geometry and handling occlusions while multiple foreground objects
are involved in image composition. In addition, we introduce a novel attention
mask mechanism that guides to adapt the appearance of foreground objects which
also helps to provide better training reference for learning in geometry
domain. Extensive experiments on scene text image synthesis, portrait editing
and indoor rendering tasks show that the proposed HIC-GAN achieves superior
synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure
FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly
While text-3D editing has made significant strides in leveraging score
distillation sampling, emerging approaches still fall short in delivering
separable, precise and consistent outcomes that are vital to content creation.
In response, we introduce FocalDreamer, a framework that merges base shape with
editable parts according to text prompts for fine-grained editing within
desired regions. Specifically, equipped with geometry union and dual-path
rendering, FocalDreamer assembles independent 3D parts into a complete object,
tailored for convenient instance reuse and part-wise control. We propose
geometric focal loss and style consistency regularization, which encourage
focal fusion and congruent overall appearance. Furthermore, FocalDreamer
generates high-fidelity geometry and PBR textures which are compatible with
widely-used graphics engines. Extensive experiments have highlighted the
superior editing capabilities of FocalDreamer in both quantitative and
qualitative evaluations.Comment: Project website: https://focaldreamer.github.i
Freeform User Interfaces for Graphical Computing
報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専
Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields
Text-driven localized editing of 3D objects is particularly difficult as
locally mixing the original 3D object with the intended new object and style
effects without distorting the object's form is not a straightforward process.
To address this issue, we propose a novel NeRF-based model, Blending-NeRF,
which consists of two NeRF networks: pretrained NeRF and editable NeRF.
Additionally, we introduce new blending operations that allow Blending-NeRF to
properly edit target regions which are localized by text. By using a pretrained
vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects
with varying colors and densities, modify textures, and remove parts of the
original object. Our extensive experiments demonstrate that Blending-NeRF
produces naturally and locally edited 3D objects from various text prompts. Our
project page is available at https://seokhunchoi.github.io/Blending-NeRF/Comment: Accepted to ICCV 2023. The first two authors contributed equally to
this wor
Structural matching by discrete relaxation
This paper describes a Bayesian framework for performing relational graph matching by discrete relaxation. Our basic aim is to draw on this framework to provide a comparative evaluation of a number of contrasting approaches to relational matching. Broadly speaking there are two main aspects to this study. Firstly we locus on the issue of how relational inexactness may be quantified. We illustrate that several popular relational distance measures can be recovered as specific limiting cases of the Bayesian consistency measure. The second aspect of our comparison concerns the way in which structural inexactness is controlled. We investigate three different realizations ai the matching process which draw on contrasting control models. The main conclusion of our study is that the active process of graph-editing outperforms the alternatives in terms of its ability to effectively control a large population of contaminating clutter
TextureGAN: Controlling Deep Image Synthesis with Texture Patches
In this paper, we investigate deep image synthesis guided by sketch, color,
and texture. Previous image synthesis methods can be controlled by sketch and
color strokes but we are the first to examine texture control. We allow a user
to place a texture patch on a sketch at arbitrary locations and scales to
control the desired output texture. Our generative network learns to synthesize
objects consistent with these texture suggestions. To achieve this, we develop
a local texture loss in addition to adversarial and content loss to train the
generative network. We conduct experiments using sketches generated from real
images and textures sampled from a separate texture database and results show
that our proposed algorithm is able to generate plausible images that are
faithful to user controls. Ablation studies show that our proposed pipeline can
generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh
Static scene illumination estimation from video with applications
We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods require either user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g., for augmented reality), shadow detection, and video relighting
- …