46 research outputs found
Towards deep unsupervised inverse graphics
Un objectif de longue date dans le domaine de la vision par ordinateur est de déduire le
contenu 3D d’une scène à partir d’une seule photo, une tâche connue sous le nom d’inverse
graphics. L’apprentissage automatique a, dans les dernières années, permis à de nombreuses
approches de faire de grands progrès vers la résolution de ce problème. Cependant, la plupart
de ces approches requièrent des données de supervision 3D qui sont coûteuses et parfois
impossible à obtenir, ce qui limite les capacités d’apprentissage de telles œuvres. Dans
ce travail, nous explorons l’architecture des méthodes d’inverse graphics non-supervisées
et proposons deux méthodes basées sur des représentations 3D et algorithmes de rendus
différentiables distincts: les surfels ainsi qu’une nouvelle représentation basée sur Voronoï.
Dans la première méthode basée sur les surfels, nous montrons que, bien qu’efficace pour
maintenir la cohérence visuelle, la production de surfels à l’aide d’une carte de profondeur
apprise entraîne des ambiguïtés car la relation entre la carte de profondeur et le rendu n’est
pas bijective. Dans notre deuxième méthode, nous introduisons une nouvelle représentation
3D basée sur les diagrammes de Voronoï qui modélise des objets/scènes à la fois explicitement
et implicitement, combinant ainsi les avantages des deux approches. Nous montrons comment
cette représentation peut être utilisée à la fois dans un contexte supervisé et non-supervisé
et discutons de ses avantages par rapport aux représentations 3D traditionnellesA long standing goal of computer vision is to infer the underlying 3D content in a scene from
a single photograph, a task known as inverse graphics. Machine learning has, in recent years,
enabled many approaches to make great progress towards solving this problem. However,
most approaches rely on 3D supervision data which is expensive and sometimes impossible
to obtain and therefore limits the learning capabilities of such work. In this work, we explore
the deep unsupervised inverse graphics training pipeline and propose two methods based on
distinct 3D representations and associated differentiable rendering algorithms: namely surfels
and a novel Voronoi-based representation. In the first method based on surfels, we show that,
while effective at maintaining view-consistency, producing view-dependent surfels using a
learned depth map results in ambiguities as the mapping between depth map and rendering
is non-bijective. In our second method, we introduce a novel 3D representation based on
Voronoi diagrams which models objects/scenes both explicitly and implicitly simultaneously,
thereby combining the benefits of both. We show how this representation can be used in both
a supervised and unsupervised context and discuss its advantages compared to traditional
3D representations
Towards Scalable Multi-View Reconstruction of Geometry and Materials
In this paper, we propose a novel method for joint recovery of camera pose,
object geometry and spatially-varying Bidirectional Reflectance Distribution
Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be
captured with stationary light stages. The input are high-resolution RGB-D
images captured by a mobile, hand-held capture system with point lights for
active illumination. Compared to previous works that jointly estimate geometry
and materials from a hand-held scanner, we formulate this problem using a
single objective function that can be minimized using off-the-shelf
gradient-based solvers. To facilitate scalability to large numbers of
observation views and optimization variables, we introduce a distributed
optimization algorithm that reconstructs 2.5D keyframe-based representations of
the scene. A novel multi-view consistency regularizer effectively synchronizes
neighboring keyframes such that the local optimization results allow for
seamless integration into a globally consistent 3D model. We provide a study on
the importance of each component in our formulation and show that our method
compares favorably to baselines. We further demonstrate that our method
accurately reconstructs various objects and materials and allows for expansion
to spatially larger scenes. We believe that this work represents a significant
step towards making geometry and material estimation from hand-held scanners
scalable
Depth Super-Resolution Meets Uncalibrated Photometric Stereo
A novel depth super-resolution approach for RGB-D sensors is presented. It
disambiguates depth super-resolution through high-resolution photometric clues
and, symmetrically, it disambiguates uncalibrated photometric stereo through
low-resolution depth cues. To this end, an RGB-D sequence is acquired from the
same viewing angle, while illuminating the scene from various uncalibrated
directions. This sequence is handled by a variational framework which fits
high-resolution shape and reflectance, as well as lighting, to both the
low-resolution depth measurements and the high-resolution RGB ones. The key
novelty consists in a new PDE-based photometric stereo regularizer which
implicitly ensures surface regularity. This allows to carry out depth
super-resolution in a purely data-driven manner, without the need for any
ad-hoc prior or material calibration. Real-world experiments are carried out
using an out-of-the-box RGB-D sensor and a hand-held LED light source.Comment: International Conference on Computer Vision (ICCV) Workshop, 201
Automated inverse-rendering techniques for realistic 3D artefact compositing in 2D photographs
PhD ThesisThe process of acquiring images of a scene and modifying the defining structural features
of the scene through the insertion of artefacts is known in literature as compositing. The
process can take effect in the 2D domain (where the artefact originates from a 2D image
and is inserted into a 2D image), or in the 3D domain (the artefact is defined as a dense
3D triangulated mesh, with textures describing its material properties).
Compositing originated as a solution to enhancing, repairing, and more broadly editing
photographs and video data alike in the film industry as part of the post-production stage.
This is generally thought of as carrying out operations in a 2D domain (a single image
with a known width, height, and colour data). The operations involved are sequential and
entail separating the foreground from the background (matting), or identifying features
from contour (feature matching and segmentation) with the purpose of introducing new
data in the original. Since then, compositing techniques have gained more traction in the
emerging fields of Mixed Reality (MR), Augmented Reality (AR), robotics and machine
vision (scene understanding, scene reconstruction, autonomous navigation). When focusing
on the 3D domain, compositing can be translated into a pipeline 1 - the incipient stage
acquires the scene data, which then undergoes a number of processing steps aimed at
inferring structural properties that ultimately allow for the placement of 3D artefacts
anywhere within the scene, rendering a plausible and consistent result with regard to the
physical properties of the initial input.
This generic approach becomes challenging in the absence of user annotation and
labelling of scene geometry, light sources and their respective magnitude and orientation,
as well as a clear object segmentation and knowledge of surface properties. A single image,
a stereo pair, or even a short image stream may not hold enough information regarding
the shape or illumination of the scene, however, increasing the input data will only incur
an extensive time penalty which is an established challenge in the field.
Recent state-of-the-art methods address the difficulty of inference in the absence of
1In the present document, the term pipeline refers to a software solution formed of stand-alone modules
or stages. It implies that the flow of execution runs in a single direction, and that each module has the
potential to be used on its own as part of other solutions. Moreover, each module is assumed to take an
input set and output data for the following stage, where each module addresses a single type of problem
only.
data, nonetheless, they do not attempt to solve the challenge of compositing artefacts
between existing scene geometry, or cater for the inclusion of new geometry behind complex
surface materials such as translucent glass or in front of reflective surfaces.
The present work focuses on the compositing in the 3D domain and brings forth a
software framework 2 that contributes solutions to a number of challenges encountered in
the field, including the ability to render physically-accurate soft shadows in the absence
of user annotate scene properties or RGB-D data. Another contribution consists in the
timely manner in which the framework achieves a believable result compared to the other
compositing methods which rely on offline rendering. The availability of proprietary
hardware and user expertise are two of the main factors that are not required in order to
achieve a fast and reliable results within the current framework
Measuring and understanding light in real life scenarios
Lighting design and modelling (the efficient and aesthetic placement of luminaires in a virtual or real scene) or industrial applications like luminaire planning and commissioning (the luminaire's installation and evaluation process along to the scene's geometry and structure) rely heavily on high realism and physically correct simulations. The current typical approaches are based only on CAD modeling simulations and offline rendering, with long processing times and therefore inflexible workflows. In this thesis we examine whether different camera-aided light modeling and numerical optimization approaches could be used to accurately understand, model and measure the light distribution in real life scenarios within real world environments. We show that factorization techniques could play a semantic role for light decomposition and light source identification, while we contribute a novel benchmark dataset and metrics for it. Thereafter we adapt a well known global illumination model (i.e. radiosity) and we extend it so that to overcome some of its basic limitations related to the assumption of point based only light sources or the adaption of only isotropic light perception sensors. We show that this extended radiosity numerical model can challenge the state-of-the-art in obtaining accurate dense spatial light measurements over time and in different scenarios. Finally we combine the latter model with human-centric sensing information and present how this could be beneficial for smart lighting applications related to quality lighting and power efficiency. Thus, with this work we contribute by setting the baselines for using an RGBD camera input as the only requirement to light modeling methods for light estimation in real life scenarios, and open a new applicability where the illumination modeling can be turned into an interactive process, allowing for real-time modifications and immediate feedback on the spatial illumination of a scene over time towards quality lighting and energy efficient solutions
{3D} Morphable Face Models -- Past, Present and Future
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications