33,644 research outputs found
A system for image-based modeling and photo editing
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2002.Includes bibliographical references (p. 169-178).Traditionally in computer graphics, a scene is represented by geometric primitives composed of various materials and a collection of lights. Recently, techniques for modeling and rendering scenes from a set of pre-acquired images have emerged as an alternative approach, known as image-based modeling and rendering. Much of the research in this field has focused on reconstructing and rerendering from a set of photographs, while little work has been done to address the problem of editing and modifying these scenes. On the other hand, photo-editing systems, such as Adobe Photoshop, provide a powerful, intuitive, and practical means to edit images. However, these systems are limited by their two-dimensional nature. In this thesis, we present a system that extends photo editing to 3D. Starting from a single input image, the system enables the user to reconstruct a 3D representation of the captured scene, and edit it with the ease and versatility of 2D photo editing. The scene is represented as layers of images with depth, where each layer is an image that encodes both color and depth. A suite of user-assisted tools are employed, based on a painting metaphor, to extract layers and assign depths. The system enables editing from different viewpoints, extracting and grouping of image-based objects, and modifying the shape, color, and illumination of these objects. As part of the system, we introduce three powerful new editing tools. These include two new clone brushing tools: the non-distorted clone brush and the structure-preserving clone brush. They permit copying of parts of an image to another via a brush interface, but alleviate distortions due to perspective foreshortening and object geometry.(cont.) The non-distorted clone brush works on arbitrary 3D geometry, while the structure-preserving clone brush, a 2D version, assumes a planar surface, but has the added advantage of working directly in 2D photo-editing systems that lack depth information. The third tool, a texture-illuminance decoupling filter, discounts the effect of illumination on uniformly textured areas by decoupling large- and small-scale features via bilateral filtering. This tool is crucial for relighting and changing the materials of the scene. There are many applications for such a system, for example architectural, lighting and landscape design, entertainment and special effects, games, and virtual TV sets. The system allows the user to superimpose scaled architectural models into real environments, or to quickly paint a desired lighting scheme of an interior, while being able to navigate within the scene for a fully immersive 3D experience. We present examples and results of complex architectural scenes, 360-degree panoramas, and even paintings, where the user can change viewpoints, edit the geometry and materials, and relight the environment.by Byong Mok Oh.Ph.D
SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
Despite the great success in 2D editing using user-friendly tools, such as
Photoshop, semantic strokes, or even text prompts, similar capabilities in 3D
areas are still limited, either relying on 3D modeling skills or allowing
editing within only a few categories. In this paper, we present a novel
semantic-driven NeRF editing approach, which enables users to edit a neural
radiance field with a single image, and faithfully delivers edited novel views
with high fidelity and multi-view consistency. To achieve this goal, we propose
a prior-guided editing field to encode fine-grained geometric and texture
editing in 3D space, and develop a series of techniques to aid the editing
process, including cyclic constraints with a proxy mesh to facilitate geometric
supervision, a color compositing mechanism to stabilize semantic-driven texture
editing, and a feature-cluster-based regularization to preserve the irrelevant
content unchanged. Extensive experiments and editing examples on both
real-world and synthetic data demonstrate that our method achieves
photo-realistic 3D editing using only a single edited image, pushing the bound
of semantic-driven editing in 3D real-world scenes. Our project webpage:
https://zju3dv.github.io/sine/.Comment: Accepted to CVPR 2023. Project Page: https://zju3dv.github.io/sine
Calipso: Physics-based Image and Video Editing through CAD Model Proxies
We present Calipso, an interactive method for editing images and videos in a
physically-coherent manner. Our main idea is to realize physics-based
manipulations by running a full physics simulation on proxy geometries given by
non-rigidly aligned CAD models. Running these simulations allows us to apply
new, unseen forces to move or deform selected objects, change physical
parameters such as mass or elasticity, or even add entire new objects that
interact with the rest of the underlying scene. In Calipso, the user makes
edits directly in 3D; these edits are processed by the simulation and then
transfered to the target 2D content using shape-to-image correspondences in a
photo-realistic rendering process. To align the CAD models, we introduce an
efficient CAD-to-image alignment procedure that jointly minimizes for rigid and
non-rigid alignment while preserving the high-level structure of the input
shape. Moreover, the user can choose to exploit image flow to estimate scene
motion, producing coherent physical behavior with ambient dynamics. We
demonstrate Calipso's physics-based editing on a wide range of examples
producing myriad physical behavior while preserving geometric and visual
consistency.Comment: 11 page
Creating virtual models from uncalibrated camera views
The reconstruction of photorealistic 3D models from camera views is becoming an ubiquitous element in many applications that simulate physical interaction with the real world. In this paper, we present a low-cost, interactive pipeline aimed at non-expert users, that achieves 3D reconstruction from multiple views acquired with a standard digital camera. 3D models are amenable to access through diverse representation modalities that typically imply trade-offs between level of detail, interaction, and computational costs. Our approach allows users to selectively control the complexity of different surface regions, while requiring only simple 2D image editing operations. An initial reconstruction at coarse resolution is followed by an iterative refining of the surface areas corresponding to the selected regions
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
- …