180 research outputs found
Photometric Depth Super-Resolution
This study explores the use of photometric techniques (shape-from-shading and
uncalibrated photometric stereo) for upsampling the low-resolution depth map
from an RGB-D sensor to the higher resolution of the companion RGB image. A
single-shot variational approach is first put forward, which is effective as
long as the target's reflectance is piecewise-constant. It is then shown that
this dependency upon a specific reflectance model can be relaxed by focusing on
a specific class of objects (e.g., faces), and delegate reflectance estimation
to a deep neural network. A multi-shot strategy based on randomly varying
lighting conditions is eventually discussed. It requires no training or prior
on the reflectance, yet this comes at the price of a dedicated acquisition
setup. Both quantitative and qualitative evaluations illustrate the
effectiveness of the proposed methods on synthetic and real-world scenarios.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(T-PAMI), 2019. First three authors contribute equall
DeLight-Net: Decomposing Reflectance Maps into Specular Materials and Natural Illumination
In this paper we are extracting surface reflectance and natural environmental
illumination from a reflectance map, i.e. from a single 2D image of a sphere of
one material under one illumination. This is a notoriously difficult problem,
yet key to various re-rendering applications. With the recent advances in
estimating reflectance maps from 2D images their further decomposition has
become increasingly relevant.
To this end, we propose a Convolutional Neural Network (CNN) architecture to
reconstruct both material parameters (i.e. Phong) as well as illumination (i.e.
high-resolution spherical illumination maps), that is solely trained on
synthetic data. We demonstrate that decomposition of synthetic as well as real
photographs of reflectance maps, both in High Dynamic Range (HDR), and, for the
first time, on Low Dynamic Range (LDR) as well. Results are compared to
previous approaches quantitatively as well as qualitatively in terms of
re-renderings where illumination, material, view or shape are changed.Comment: Stamatios Georgoulis and Konstantinos Rematas contributed equally to
this wor
Deep Reflectance Maps
Undoing the image formation process and therefore decomposing appearance into
its intrinsic properties is a challenging task due to the under-constraint
nature of this inverse problem. While significant progress has been made on
inferring shape, materials and illumination from images only, progress in an
unconstrained setting is still limited. We propose a convolutional neural
architecture to estimate reflectance maps of specular materials in natural
lighting conditions. We achieve this in an end-to-end learning formulation that
directly predicts a reflectance map from the image itself. We show how to
improve estimates by facilitating additional supervision in an indirect scheme
that first predicts surface orientation and afterwards predicts the reflectance
map by a learning-based sparse data interpolation.
In order to analyze performance on this difficult task, we propose a new
challenge of Specular MAterials on SHapes with complex IllumiNation (SMASHINg)
using both synthetic and real images. Furthermore, we show the application of
our method to a range of image-based editing tasks on real images.Comment: project page: http://homes.esat.kuleuven.be/~krematas/DRM
Multilinear methods for disentangling variations with applications to facial analysis
Several factors contribute to the appearance of an object in a visual scene, including pose,
illumination, and deformation, among others. Each factor accounts for a source of variability
in the data. It is assumed that the multiplicative interactions of these factors emulate the
entangled variability, giving rise to the rich structure of visual object appearance. Disentangling
such unobserved factors from visual data is a challenging task, especially when the data have
been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label
information is not available. The work presented in this thesis focuses on disentangling the
variations contained in visual data, in particular applied to 2D and 3D faces. The motivation
behind this work lies in recent developments in the field, such as (i) the creation of large, visual
databases for face analysis, with (ii) the need of extracting information without the use of labels
and (iii) the need to deploy systems under demanding, real-world conditions.
In the first part of this thesis, we present a method to synthesise plausible 3D expressions
that preserve the identity of a target subject. This method is supervised as the model uses
labels, in this case 3D facial meshes of people performing a defined set of facial expressions, to
learn. The ability to synthesise an entire facial rig from a single neutral expression has a large
range of applications both in computer graphics and computer vision, ranging from the ecient
and cost-e↵ective creation of CG characters to scalable data generation for machine learning
purposes. Unlike previous methods based on multilinear models, the proposed approach is
capable to extrapolate well outside the sample pool, which allows it to accurately reproduce
the identity of the target subject and create artefact-free expression shapes while requiring
only a small input dataset. We introduce global-local multilinear models that leverage the
strengths of expression-specific and identity-specific local models combined with coarse motion
estimations from a global model. The expression-specific and identity-specific local models
are built from di↵erent slices of the patch-wise local multilinear model. Experimental results
show that we achieve high-quality, identity-preserving facial expression synthesis results that
outperform existing methods both quantitatively and qualitatively.
In the second part of this thesis, we investigate how the modes of variations from visual data
can be extracted. Our assumption is that visual data has an underlying structure consisting of
factors of variation and their interactions. Finding this structure and the factors is important
as it would not only help us to better understand visual data but once obtained we can edit the factors for use in various applications. Shape from Shading and expression transfer are just two
of the potential applications. To extract the factors of variation, several supervised methods
have been proposed but they require both labels regarding the modes of variations and the same
number of samples under all modes of variations. Therefore, their applicability is limited to
well-organised data, usually captured in well-controlled conditions. We propose a novel general
multilinear matrix decomposition method that discovers the multilinear structure of possibly
incomplete sets of visual data in unsupervised setting. We demonstrate the applicability of the
proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the
wild and with occlusion removal), expression transfer, and estimation of surface normals from
images captured in the wild.
Finally, leveraging the unsupervised multilinear method proposed as well as recent advances in
deep learning, we propose a weakly supervised deep learning method for disentangling multiple
latent factors of variation in face images captured in-the-wild. To this end, we propose a deep
latent variable model, where we model the multiplicative interactions of multiple latent factors
of variation explicitly as a multilinear structure. We demonstrate that the proposed approach
indeed learns disentangled representations of facial expressions and pose, which can be used in
various applications, including face editing, as well as 3D face reconstruction and classification
of facial expression, identity and pose.Open Acces
Learning to Reconstruct Texture-less Deformable Surfaces from a Single View
Recent years have seen the development of mature solutions for reconstructing
deformable surfaces from a single image, provided that they are relatively
well-textured. By contrast, recovering the 3D shape of texture-less surfaces
remains an open problem, and essentially relates to Shape-from-Shading. In this
paper, we introduce a data-driven approach to this problem. We introduce a
general framework that can predict diverse 3D representations, such as meshes,
normals, and depth maps. Our experiments show that meshes are ill-suited to
handle texture-less 3D reconstruction in our context. Furthermore, we
demonstrate that our approach generalizes well to unseen objects, and that it
yields higher-quality reconstructions than a state-of-the-art SfS technique,
particularly in terms of normal estimates. Our reconstructions accurately model
the fine details of the surfaces, such as the creases of a T-Shirt worn by a
person.Comment: Accepted to 3DV 201
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Recovering the 3D representation of an object from single-view or multi-view
RGB images by deep neural networks has attracted increasing attention in the
past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural
networks (RNNs) to fuse multiple feature maps extracted from input images
sequentially. However, when given the same set of input images with different
orders, RNN-based approaches are unable to produce consistent reconstruction
results. Moreover, due to long-term memory loss, RNNs cannot fully exploit
input images to refine reconstruction results. To solve these problems, we
propose a novel framework for single-view and multi-view 3D reconstruction,
named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse
3D volume from each input image. Then, a context-aware fusion module is
introduced to adaptively select high-quality reconstructions for each part
(e.g., table legs) from different coarse 3D volumes to obtain a fused 3D
volume. Finally, a refiner further refines the fused 3D volume to generate the
final output. Experimental results on the ShapeNet and Pix3D benchmarks
indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large
margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in
terms of backward inference time. The experiments on ShapeNet unseen 3D
categories have shown the superior generalization abilities of our method.Comment: ICCV 201
Shape from Shading through Shape Evolution
In this paper, we address the shape-from-shading problem by training deep
networks with synthetic images. Unlike conventional approaches that combine
deep learning and synthetic imagery, we propose an approach that does not need
any external shape dataset to render synthetic images. Our approach consists of
two synergistic processes: the evolution of complex shapes from simple
primitives, and the training of a deep network for shape-from-shading. The
evolution generates better shapes guided by the network training, while the
training improves by using the evolved shapes. We show that our approach
achieves state-of-the-art performance on a shape-from-shading benchmark
Recovering Intrinsic Images from a Single Image
We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, each image derivative is classified as being caused by shading or a change in the surface's reflectance. Generalized Belief Propagation is then used to propagate information from areas where the correct classification is clear to areas where it is ambiguous. We also show results on real images
- …