17 research outputs found
Material acquisition using deep learning
International audienceTexture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. I explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by a hand-held flash. We propose a method which improves its prediction with the number of input pictures, and reaches high quality reconstructions with up to 10 images -- a sweet spot between existing single-image and complex multi-image approaches. We introduce several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture
Flexible SVBRDF Capture with a Multi-Image Deep Network
International audienceEmpowered by deep learning, recent methods for material capture can estimate a spatially-varying reflectance from a single photograph. Such lightweight capture is in stark contrast with the tens or hundreds of pictures required by traditional optimization-based approaches. However, a single image is often simply not enough to observe the rich appearance of real-world materials. We present a deep-learning method capable of estimating material appearance from a variable number of uncalibrated and unordered pictures captured with a handheld camera and flash. Thanks to an order-independent fusing layer, this architecture extracts the most useful information from each picture, while benefiting from strong priors learned from data. The method can handle both view and light direction variation without calibration. We show how our method improves its prediction with the number of input pictures, and reaches high quality reconstructions with as little as 1 to 10 images-a sweet spot between existing single-image and complex multi-image approaches
Node Graph Optimization Using Differentiable Proxies
Graph-based procedural materials are ubiquitous in content production
industries. Procedural models allow the creation of photorealistic materials
with parametric control for flexible editing of appearance. However, designing
a specific material is a time-consuming process in terms of building a model
and fine-tuning parameters. Previous work [Hu et al. 2022; Shi et al. 2020]
introduced material graph optimization frameworks for matching target material
samples. However, these previous methods were limited to optimizing
differentiable functions in the graphs. In this paper, we propose a fully
differentiable framework which enables end-to-end gradient based optimization
of material graphs, even if some functions of the graph are non-differentiable.
We leverage the Differentiable Proxy, a differentiable approximator of a
non-differentiable black-box function. We use our framework to match structure
and appearance of an output material to a target material, through a
multi-stage differentiable optimization. Differentiable Proxies offer a more
general optimization solution to material appearance matching than previous
work
Generating Procedural Materials from Text or Image Prompts
Node graph systems are used ubiquitously for material design in computer
graphics. They allow the use of visual programming to achieve desired effects
without writing code. As high-level design tools they provide convenience and
flexibility, but mastering the creation of node graphs usually requires
professional training. We propose an algorithm capable of generating multiple
node graphs from different types of prompts, significantly lowering the bar for
users to explore a specific design space. Previous work was limited to
unconditional generation of random node graphs, making the generation of an
envisioned material challenging. We propose a multi-modal node graph generation
neural architecture for high-quality procedural material synthesis which can be
conditioned on different inputs (text or image prompts), using a CLIP-based
encoder. We also create a substantially augmented material graph dataset, key
to improving the generation quality. Finally, we generate high-quality graph
samples using a regularized sampling process and improve the matching quality
by differentiable optimization for top-ranked samples. We compare our methods
to CLIP-based database search baselines (which are themselves novel) and
achieve superior or similar performance without requiring massive data storage.
We further show that our model can produce a set of material graphs
unconditionally, conditioned on images, text prompts or partial graphs, serving
as a tool for automatic visual programming completion
ControlMat: A Controlled Generative Approach to Material Capture
Material reconstruction from a photograph is a key component of 3D content
creation democratization. We propose to formulate this ill-posed problem as a
controlled synthesis one, leveraging the recent progress in generative deep
networks. We present ControlMat, a method which, given a single photograph with
uncontrolled illumination as input, conditions a diffusion model to generate
plausible, tileable, high-resolution physically-based digital materials. We
carefully analyze the behavior of diffusion models for multi-channel outputs,
adapt the sampling process to fuse multi-scale information and introduce rolled
diffusion to enable both tileability and patched diffusion for high-resolution
outputs. Our generative approach further permits exploration of a variety of
materials which could correspond to the input image, mitigating the unknown
lighting conditions. We show that our approach outperforms recent inference and
latent-space-optimization methods, and carefully validate our diffusion process
design choices. Supplemental materials and additional details are available at:
https://gvecchio.com/controlmat/
PhotoMat: A Material Generator Learned from Single Flash Photos
Authoring high-quality digital materials is key to realism in 3D rendering.
Previous generative models for materials have been trained exclusively on
synthetic data; such data is limited in availability and has a visual gap to
real materials. We circumvent this limitation by proposing PhotoMat: the first
material generator trained exclusively on real photos of material samples
captured using a cell phone camera with flash. Supervision on individual
material maps is not available in this setting. Instead, we train a generator
for a neural material representation that is rendered with a learned relighting
module to create arbitrarily lit RGB images; these are compared against real
photos using a discriminator. We then train a material maps estimator to decode
material reflectance properties from the neural material representation. We
train PhotoMat with a new dataset of 12,000 material photos captured with
handheld phone cameras under flash lighting. We demonstrate that our generated
materials have better visual quality than previous material generators trained
on synthetic data. Moreover, we can fit analytical material models to closely
match these generated neural materials, thus allowing for further editing and
use in 3D rendering
Single-Image SVBRDF Capture with a Rendering-Aware Deep Network
International audienceTexture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in single pictures. Yet, recovering spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a single image based on such cues has challenged researchers in computer graphics for decades. We tackle lightweight appearance capture by training a deep neural network to automatically extract and make sense of these visual cues. Once trained, our network is capable of recovering per-pixel normal, diffuse albedo, specular albedo and specular roughness from a single picture of a flat surface lit by a hand-held flash. We achieve this goal by introducing several innovations on training data acquisition and network design. For training, we leverage a large dataset of artist-created, procedural SVBRDFs which we sample and render under multiple lighting directions. We further amplify the data by material mixing to cover a wide diversity of shading effects, which allows our network to work across many material classes. Motivated by the observation that distant regions of a material sample often offer complementary visual cues, we design a network that combines an encoder-decoder convolutional track for local feature extraction with a fully-connected track for global feature extraction and propagation. Many important material effects are view-dependent, and as such ambiguous when observed in a single image. We tackle this challenge by defining the loss as a differentiable SVBRDF similarity metric that compares the renderings of the predicted maps against renderings of the ground truth from several lighting and viewing directions. Combined together, these novel ingredients bring clear improvement over state of the art methods for single-shot capture of spatially varying BRDFs
Acquisition légère de matériaux par apprentissage profond
Whether it is used for entertainment or industrial design, computer graphics is ever more present in our everyday life. Yet, reproducing a real scene appearance in a virtual environment remains a challenging task, requiring long hours from trained artists. A good solution is the acquisition of geometries and materials directly from real world examples, but this often comes at the cost of complex hardware and calibration processes. In this thesis, we focus on lightweight material appearance capture to simplify and accelerate the acquisition process and solve industrial challenges such as result image resolution or calibration. Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. We explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Once trained, our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by the environment or a hand-held flash. We show how our method improves its prediction with the number of input pictures to reach high quality reconstructions with up to 10 images --- a sweet spot between existing single-image and complex multi-image approaches --- and allows to capture large scale, HD materials. We achieve this goal by introducing several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture.Que ce soit pour le divertissement ou le design industriel, l’infographie est de plus en plus présente dans notre vie quotidienne. Cependant, reproduire une scène réelle dans un environnement virtuel reste une tâche complexe, nécessitant de nombreuses heures de travail. L’acquisition de géométries et de matériaux à partir d’exemples réels est une solution, mais c’est souvent au prix de processus d'acquisitions et de calibrations complexes. Dans cette thèse, nous nous concentrons sur la capture légère de matériaux afin de simplifier et d’accélérer le processus d’acquisition et de résoudre les défis industriels tels que la calibration des résultats. Les textures et les ombres sont quelques-uns des nombreux indices visuels qui permettent aux humains de comprendre l'apparence d'un matériau à partir d'une seule image. La conception d'algorithmes capables de tirer parti de ces indices pour récupérer des fonctions de distribution de réflectance bidirectionnelles (SVBRDF) variant dans l'espace à partir de quelques images pose un défi aux chercheurs en infographie depuis des décennies. Nous explorons l'utilisation de l'apprentissage profond pour la capture légère de matériaux et analyser ces indices visuels. Une fois entraînés, nos réseaux sont capables d'évaluer, par pixel, les normales, les albedos diffus et spéculaires et une rugosité à partir d’une seule image d’une surface plane éclairée par l'environnement ou un flash tenu à la main. Nous montrons également comment notre méthode améliore ses prédictions avec le nombre d'images en entrée et permet des reconstructions de haute qualité en utilisant jusqu'à 10 images d'entrées --- un bon compromis entre les approches existantes
Lightweight material acquisition using deep learning
Que ce soit pour le divertissement ou le design industriel, l’infographie est de plus en plus présente dans notre vie quotidienne. Cependant, reproduire une scène réelle dans un environnement virtuel reste une tâche complexe, nécessitant de nombreuses heures de travail. L’acquisition de géométries et de matériaux à partir d’exemples réels est une solution, mais c’est souvent au prix de processus d'acquisitions et de calibrations complexes. Dans cette thèse, nous nous concentrons sur la capture légère de matériaux afin de simplifier et d’accélérer le processus d’acquisition et de résoudre les défis industriels tels que la calibration des résultats. Les textures et les ombres sont quelques-uns des nombreux indices visuels qui permettent aux humains de comprendre l'apparence d'un matériau à partir d'une seule image. La conception d'algorithmes capables de tirer parti de ces indices pour récupérer des fonctions de distribution de réflectance bidirectionnelles (SVBRDF) variant dans l'espace à partir de quelques images pose un défi aux chercheurs en infographie depuis des décennies. Nous explorons l'utilisation de l'apprentissage profond pour la capture légère de matériaux et analyser ces indices visuels. Une fois entraînés, nos réseaux sont capables d'évaluer, par pixel, les normales, les albedos diffus et spéculaires et une rugosité à partir d’une seule image d’une surface plane éclairée par l'environnement ou un flash tenu à la main. Nous montrons également comment notre méthode améliore ses prédictions avec le nombre d'images en entrée et permet des reconstructions de haute qualité en utilisant jusqu'à 10 images d'entrées --- un bon compromis entre les approches existantes.Whether it is used for entertainment or industrial design, computer graphics is ever more present in our everyday life. Yet, reproducing a real scene appearance in a virtual environment remains a challenging task, requiring long hours from trained artists. A good solution is the acquisition of geometries and materials directly from real world examples, but this often comes at the cost of complex hardware and calibration processes. In this thesis, we focus on lightweight material appearance capture to simplify and accelerate the acquisition process and solve industrial challenges such as result image resolution or calibration. Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. We explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Once trained, our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by the environment or a hand-held flash. We show how our method improves its prediction with the number of input pictures to reach high quality reconstructions with up to 10 images --- a sweet spot between existing single-image and complex multi-image approaches --- and allows to capture large scale, HD materials. We achieve this goal by introducing several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture