29 research outputs found
Novel Views of Objects from a Single Image
Taking an image of an object is at its core a lossy process. The rich information about the three-dimensional structure of the world is flattened to an image plane and decisions such as viewpoint and camera parameters are final and not easily revertible. As a consequence, possibilities of changing viewpoint are limited. Given a single image depicting an object, novel-view synthesis is the task of generating new images that render the object from a different viewpoint than the one given. The main difficulty is to synthesize the parts that are disoccluded; disocclusion occurs when parts of an object are hidden by the object itself under a specific viewpoint. In this work, we show how to improve novel-view synthesis by making use of the correlations observed in 3D models and applying them to new image instances. We propose a technique to use the structural information extracted from a 3D model that matches the image object in terms of viewpoint and shape. For the latter part, we propose an efficient 2D-to-3D alignment method that associates precisely the image appearance with the 3D model geometry with minimal user interaction. Our technique is able to simulate plausible viewpoint changes for a variety of object classes within seconds. Additionally, we show that our synthesized images can be used as additional training data that improves the performance of standard object detectors
What Is Around The Camera?
How much does a single image reveal about the environment it was taken in? In
this paper, we investigate how much of that information can be retrieved from a
foreground object, combined with the background (i.e. the visible part of the
environment). Assuming it is not perfectly diffuse, the foreground object acts
as a complexly shaped and far-from-perfect mirror. An additional challenge is
that its appearance confounds the light coming from the environment with the
unknown materials it is made of. We propose a learning-based approach to
predict the environment from multiple reflectance maps that are computed from
approximate surface normals. The proposed method allows us to jointly model the
statistics of environments and material properties. We train our system from
synthesized training data, but demonstrate its applicability to real-world
data. Interestingly, our analysis shows that the information obtained from
objects made out of multiple materials often is complementary and leads to
better performance.Comment: Accepted to ICCV. Project:
http://homes.esat.kuleuven.be/~sgeorgou/multinatillum
Joint Material and Illumination Estimation from Photo Sets in the Wild
Faithful manipulation of shape, material, and illumination in 2D Internet
images would greatly benefit from a reliable factorization of appearance into
material (i.e., diffuse and specular) and illumination (i.e., environment
maps). On the one hand, current methods that produce very high fidelity
results, typically require controlled settings, expensive devices, or
significant manual effort. To the other hand, methods that are automatic and
work on 'in the wild' Internet images, often extract only low-frequency
lighting or diffuse materials. In this work, we propose to make use of a set of
photographs in order to jointly estimate the non-diffuse materials and sharp
lighting in an uncontrolled setting. Our key observation is that seeing
multiple instances of the same material under different illumination (i.e.,
environment), and different materials under the same illumination provide
valuable constraints that can be exploited to yield a high-quality solution
(i.e., specular materials and environment illumination) for all the observed
materials and environments. Similar constraints also arise when observing
multiple materials in a single environment, or a single material across
multiple environments. The core of this approach is an optimization procedure
that uses two neural networks that are trained on synthetic images to predict
good gradients in parametric space given observation of reflected light. We
evaluate our method on a range of synthetic and real examples to generate
high-quality estimates, qualitatively compare our results against
state-of-the-art alternatives via a user study, and demonstrate
photo-consistent image manipulation that is otherwise very challenging to
achieve
Synthesizing Images of Humans in Unseen Poses
We address the computational problem of novel human pose synthesis. Given an
image of a person and a desired pose, we produce a depiction of that person in
that pose, retaining the appearance of both the person and background. We
present a modular generative neural network that synthesizes unseen poses using
training pairs of images and poses taken from human action videos. Our network
separates a scene into different body part and background layers, moves body
parts to new locations and refines their appearances, and composites the new
foreground with a hole-filled background. These subtasks, implemented with
separate modules, are trained jointly using only a single target image as a
supervised label. We use an adversarial discriminator to force our network to
synthesize realistic details conditioned on pose. We demonstrate image
synthesis results on three action classes: golf, yoga/workouts and tennis, and
show that our method produces accurate results within action classes as well as
across action classes. Given a sequence of desired poses, we also produce
coherent videos of actions.Comment: CVPR 201
Deep Appearance Maps
We propose a deep representation of appearance, i. e. the relation of color, surface orientation, viewer position, material and illumination. Previous approaches have used deep learning to extract classic appearance representations relating to reflectance model parameters (e. g. Phong) or illumination (e. g. HDR environment maps). We suggest to directly represent appearance itself as a network we call a deep appearance map (DAM). This is a 4D generalization over 2D reflectance maps, which held the view direction fixed. First, we show how a DAM can be learned from images or video frames and later be used to synthesize appearance, given new surface orientations and viewer positions. Second, we demonstrate how another network can be used to map from an image or video frames to a DAM network to reproduce this appearance, without using a lengthy optimization such as stochastic gradient descent (learning-to-learn). Finally, we generalize this to an appearance estimation-and-segmentation task, where we map from an image showing multiple materials to multiple networks reproducing their appearance, as well as per-pixel segmentation