695 research outputs found
Modeling Surface Appearance from a Single Photograph using Self-augmented Convolutional Neural Networks
We present a convolutional neural network (CNN) based solution for modeling
physically plausible spatially varying surface reflectance functions (SVBRDF)
from a single photograph of a planar material sample under unknown natural
illumination. Gathering a sufficiently large set of labeled training pairs
consisting of photographs of SVBRDF samples and corresponding reflectance
parameters, is a difficult and arduous process. To reduce the amount of
required labeled training data, we propose to leverage the appearance
information embedded in unlabeled images of spatially varying materials to
self-augment the training process. Starting from an initial approximative
network obtained from a small set of labeled training pairs, we estimate
provisional model parameters for each unlabeled training exemplar. Given this
provisional reflectance estimate, we then synthesize a novel temporary labeled
training pair by rendering the exact corresponding image under a new lighting
condition. After refining the network using these additional training samples,
we re-estimate the provisional model parameters for the unlabeled data and
repeat the self-augmentation process until convergence. We demonstrate the
efficacy of the proposed network structure on spatially varying wood, metals,
and plastics, as well as thoroughly validate the effectiveness of the
self-augmentation training process.Comment: Accepted to SIGGRAPH 201
Deep Shape from Polarization
This paper makes a first attempt to bring the Shape from Polarization (SfP)
problem to the realm of deep learning. The previous state-of-the-art methods
for SfP have been purely physics-based. We see value in these principled
models, and blend these physical models as priors into a neural network
architecture. This proposed approach achieves results that exceed the previous
state-of-the-art on a challenging dataset we introduce. This dataset consists
of polarization images taken over a range of object textures, paints, and
lighting conditions. We report that our proposed method achieves the lowest
test error on each tested condition in our dataset, showing the value of
blending data-driven and physics-driven approaches
PhotoShape: Photorealistic Materials for Large-Scale Shape Collections
Existing online 3D shape repositories contain thousands of 3D models but lack
photorealistic appearance. We present an approach to automatically assign
high-quality, realistic appearance models to large scale 3D shape collections.
The key idea is to jointly leverage three types of online data -- shape
collections, material collections, and photo collections, using the photos as
reference to guide assignment of materials to shapes. By generating a large
number of synthetic renderings, we train a convolutional neural network to
classify materials in real photos, and employ 3D-2D alignment techniques to
transfer materials to different parts of each shape model. Our system produces
photorealistic, relightable, 3D shapes (PhotoShapes).Comment: To be presented at SIGGRAPH Asia 2018. Project page:
https://keunhong.com/publications/photoshape
Coordinate-based Texture Inpainting for Pose-Guided Image Generation
We present a new deep learning approach to pose-guided resynthesis of human
photographs. At the heart of the new approach is the estimation of the complete
body surface texture based on a single photograph. Since the input photograph
always observes only a part of the surface, we suggest a new inpainting method
that completes the texture of the human body. Rather than working directly with
colors of texture elements, the inpainting network estimates an appropriate
source location in the input image for each element of the body surface. This
correspondence field between the input image and the texture is then further
warped into the target image coordinate frame based on the desired pose,
effectively establishing the correspondence between the source and the target
view even when the pose change is drastic. The final convolutional network then
uses the established correspondence and all other available information to
synthesize the output image. A fully-convolutional architecture with deformable
skip connections guided by the estimated correspondence field is used. We show
state-of-the-art result for pose-guided image synthesis. Additionally, we
demonstrate the performance of our system for garment transfer and pose-guided
face resynthesis.Comment: Published in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR). 201
Textured Neural Avatars
We present a system for learning full-body neural avatars, i.e. deep networks
that produce full-body renderings of a person for varying body pose and camera
position. Our system takes the middle path between the classical graphics
pipeline and the recent deep learning approaches that generate images of humans
using image-to-image translation. In particular, our system estimates an
explicit two-dimensional texture map of the model surface. At the same time, it
abstains from explicit shape modeling in 3D. Instead, at test time, the system
uses a fully-convolutional network to directly map the configuration of body
feature points w.r.t. the camera to the 2D texture coordinates of individual
pixels in the image frame. We show that such a system is capable of learning to
generate realistic renderings while being trained on videos annotated with 3D
poses and foreground masks. We also demonstrate that maintaining an explicit
texture representation helps our system to achieve better generalization
compared to systems that use direct image-to-image translation
BRDF Estimation of Complex Materials with Nested Learning
The estimation of the optical properties of a material from RGB-images is an
important but extremely ill-posed problem in Computer Graphics. While recent
works have successfully approached this problem even from just a single
photograph, significant simplifications of the material model are assumed,
limiting the usability of such methods. The detection of complex material
properties such as anisotropy or Fresnel effect remains an unsolved challenge.
We propose a novel method that predicts the model parameters of an
artist-friendly, physically-based BRDF, from only two low-resolution shots of
the material. Thanks to a novel combination of deep neural networks in a nested
architecture, we are able to handle the ambiguities given by the
non-orthogonality and non-convexity of the parameter space. To train the
network, we generate a novel dataset of physically-based synthetic images. We
prove that our model can recover new properties like anisotropy, index of
refraction and a second reflectance color, for materials that have tinted
specular reflections or whose albedo changes at glancing angles.Comment: Accepted to IEEE Winter Conference on Applications of Computer Vision
2019 (WACV 2019
Inverse Transport Networks
We introduce inverse transport networks as a learning architecture for
inverse rendering problems where, given input image measurements, we seek to
infer physical scene parameters such as shape, material, and illumination.
During training, these networks are evaluated not only in terms of how close
they can predict groundtruth parameters, but also in terms of whether the
parameters they produce can be used, together with physically-accurate graphics
renderers, to reproduce the input image measurements. To en- able training of
inverse transport networks using stochastic gradient descent, we additionally
create a general-purpose, physically-accurate differentiable renderer, which
can be used to estimate derivatives of images with respect to arbitrary
physical scene parameters. Our experiments demonstrate that inverse transport
networks can be trained efficiently using differentiable rendering, and that
they generalize to scenes with completely unseen geometry and illumination
better than networks trained without appearance- matching regularization
Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image
We propose a material acquisition approach to recover the spatially-varying
BRDF and normal map of a near-planar surface from a single image captured by a
handheld mobile phone camera. Our method images the surface under arbitrary
environment lighting with the flash turned on, thereby avoiding shadows while
simultaneously capturing high-frequency specular highlights. We train a CNN to
regress an SVBRDF and surface normals from this image. Our network is trained
using a large-scale SVBRDF dataset and designed to incorporate physical
insights for material estimation, including an in-network rendering layer to
model appearance and a material classifier to provide additional supervision
during training. We refine the results from the network using a dense CRF
module whose terms are designed specifically for our task. The framework is
trained end-to-end and produces high quality results for a variety of
materials. We provide extensive ablation studies to evaluate our network on
both synthetic and real data, while demonstrating significant improvements in
comparisons with prior works.Comment: submitted to European Conference on Computer Visio
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Flexible SVBRDF Capture with a Multi-Image Deep Network
Empowered by deep learning, recent methods for material capture can estimate
a spatially-varying reflectance from a single photograph. Such lightweight
capture is in stark contrast with the tens or hundreds of pictures required by
traditional optimization-based approaches. However, a single image is often
simply not enough to observe the rich appearance of real-world materials. We
present a deep-learning method capable of estimating material appearance from a
variable number of uncalibrated and unordered pictures captured with a handheld
camera and flash. Thanks to an order-independent fusing layer, this
architecture extracts the most useful information from each picture, while
benefiting from strong priors learned from data. The method can handle both
view and light direction variation without calibration. We show how our method
improves its prediction with the number of input pictures, and reaches high
quality reconstructions with as little as 1 to 10 images -- a sweet spot
between existing single-image and complex multi-image approaches.Comment: Accepted to EGSR 2019 in the CGF trac
- …