889 research outputs found
NARRATE: A Normal Assisted Free-View Portrait Stylizer
In this work, we propose NARRATE, a novel pipeline that enables
simultaneously editing portrait lighting and perspective in a photorealistic
manner. As a hybrid neural-physical face model, NARRATE leverages complementary
benefits of geometry-aware generative approaches and normal-assisted physical
face models. In a nutshell, NARRATE first inverts the input portrait to a
coarse geometry and employs neural rendering to generate images resembling the
input, as well as producing convincing pose changes. However, inversion step
introduces mismatch, bringing low-quality images with less facial details. As
such, we further estimate portrait normal to enhance the coarse geometry,
creating a high-fidelity physical face model. In particular, we fuse the neural
and physical renderings to compensate for the imperfect inversion, resulting in
both realistic and view-consistent novel perspective images. In relighting
stage, previous works focus on single view portrait relighting but ignoring
consistency between different perspectives as well, leading unstable and
inconsistent lighting effects for view changes. We extend Total Relighting to
fix this problem by unifying its multi-view input normal maps with the physical
face model. NARRATE conducts relighting with consistent normal maps, imposing
cross-view constraints and exhibiting stable and coherent illumination effects.
We experimentally demonstrate that NARRATE achieves more photorealistic,
reliable results over prior works. We further bridge NARRATE with animation and
style transfer tools, supporting pose change, light change, facial animation,
and style transfer, either separately or in combination, all at a photographic
quality. We showcase vivid free-view facial animations as well as 3D-aware
relightable stylization, which help facilitate various AR/VR applications like
virtual cinematography, 3D video conferencing, and post-production.Comment: 14 pages,13 figures https://youtu.be/mP4FV3evmy
Video Manipulation Techniques for the Protection of Privacy in Remote Presence Systems
Systems that give control of a mobile robot to a remote user raise privacy
concerns about what the remote user can see and do through the robot. We aim to
preserve some of that privacy by manipulating the video data that the remote
user sees. Through two user studies, we explore the effectiveness of different
video manipulation techniques at providing different types of privacy. We
simultaneously examine task performance in the presence of privacy protection.
In the first study, participants were asked to watch a video captured by a
robot exploring an office environment and to complete a series of observational
tasks under differing video manipulation conditions. Our results show that
using manipulations of the video stream can lead to fewer privacy violations
for different privacy types. Through a second user study, it was demonstrated
that these privacy-protecting techniques were effective without diminishing the
task performance of the remote user.Comment: 14 pages, 8 figure
ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis
Physical simulations produce excellent predictions of weather effects. Neural
radiance fields produce SOTA scene models. We describe a novel NeRF-editing
procedure that can fuse physical simulations with NeRF models of scenes,
producing realistic movies of physical phenomena inthose scenes. Our
application -- Climate NeRF -- allows people to visualize what climate change
outcomes will do to them. ClimateNeRF allows us to render realistic weather
effects, including smog, snow, and flood. Results can be controlled with
physically meaningful variables like water level. Qualitative and quantitative
studies show that our simulated results are significantly more realistic than
those from state-of-the-art 2D image editing and 3D NeRF stylization.Comment: project page: https://climatenerf.github.io
Plausible Shading Decomposition For Layered Photo Retouching
Photographers routinely compose multiple manipulated photos of the same scene (layers) into a single image, which is better than any individual photo could be alone. Similarly, 3D artists set up rendering systems to produce layered images to contain only individual aspects of the light transport, which are composed into the final result in post-production. Regrettably, both approaches either take considerable time to capture, or remain limited to synthetic scenes. In this paper, we suggest a system to allow decomposing a single image into a plausible shading decomposition (PSD) that approximates effects such as shadow, diffuse illumination, albedo, and specular shading. This decomposition can then be manipulated in any off-the-shelf image manipulation software and recomposited back. We perform such a decomposition by learning a convolutional neural network trained using synthetic data. We demonstrate the effectiveness of our decomposition on synthetic (i.e., rendered) and real data (i.e., photographs), and use them for common photo manipulation, which are nearly impossible to perform otherwise from single images
ISS++: Image as Stepping Stone for Text-Guided 3D Shape Generation
In this paper, we present a new text-guided 3D shape generation approach
(ISS++) that uses images as a stepping stone to bridge the gap between text and
shape modalities for generating 3D shapes without requiring paired text and 3D
data. The core of our approach is a two-stage feature-space alignment strategy
that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP
features to shapes: to begin with, map the CLIP image feature to the
detail-rich 3D shape space of the SVR model, then map the CLIP text feature to
the 3D shape space through encouraging the CLIP-consistency between rendered
images and the input text. Besides, to extend beyond the generative capability
of the SVR model, we design a text-guided 3D shape stylization module that can
enhance the output shapes with novel structures and textures. Further, we
exploit pre-trained text-to-image diffusion models to enhance the generative
diversity, fidelity, and stylization capability. Our approach is generic,
flexible, and scalable, and it can be easily integrated with various SVR models
to expand the generative space and improve the generative fidelity. Extensive
experimental results demonstrate that our approach outperforms the
state-of-the-art methods in terms of generative quality and consistency with
the input text. Codes and models are released at
https://github.com/liuzhengzhe/ISS-Image-as-Stepping-Stone-for-Text-Guided-3D-Shape-Generation.Comment: Under review of TPAM
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
Time-of-Day Neural Style Transfer for Architectural Photographs
Architectural photography is a genre of photography that focuses on capturing
a building or structure in the foreground with dramatic lighting in the
background. Inspired by recent successes in image-to-image translation methods,
we aim to perform style transfer for architectural photographs. However, the
special composition in architectural photography poses great challenges for
style transfer in this type of photographs. Existing neural style transfer
methods treat the architectural images as a single entity, which would generate
mismatched chrominance and destroy geometric features of the original
architecture, yielding unrealistic lighting, wrong color rendition, and visual
artifacts such as ghosting, appearance distortion, or color mismatching. In
this paper, we specialize a neural style transfer method for architectural
photography. Our method addresses the composition of the foreground and
background in an architectural photograph in a two-branch neural network that
separately considers the style transfer of the foreground and the background,
respectively. Our method comprises a segmentation module, a learning-based
image-to-image translation module, and an image blending optimization module.
We trained our image-to-image translation neural network with a new dataset of
unconstrained outdoor architectural photographs captured at different magic
times of a day, utilizing additional semantic information for better
chrominance matching and geometry preservation. Our experiments show that our
method can produce photorealistic lighting and color rendition on both the
foreground and background, and outperforms general image-to-image translation
and arbitrary style transfer baselines quantitatively and qualitatively. Our
code and data are available at
https://github.com/hkust-vgd/architectural_style_transfer.Comment: Updated version with corrected equations. Paper published at the
International Conference on Computational Photography (ICCP) 2022. 12 pages
of content with 6 pages of supplementary material
- …