2,680 research outputs found
On the Design and Analysis of Multiple View Descriptors
We propose an extension of popular descriptors based on gradient orientation
histograms (HOG, computed in a single image) to multiple views. It hinges on
interpreting HOG as a conditional density in the space of sampled images, where
the effects of nuisance factors such as viewpoint and illumination are
marginalized. However, such marginalization is performed with respect to a very
coarse approximation of the underlying distribution. Our extension leverages on
the fact that multiple views of the same scene allow separating intrinsic from
nuisance variability, and thus afford better marginalization of the latter. The
result is a descriptor that has the same complexity of single-view HOG, and can
be compared in the same manner, but exploits multiple views to better trade off
insensitivity to nuisance variability with specificity to intrinsic
variability. We also introduce a novel multi-view wide-baseline matching
dataset, consisting of a mixture of real and synthetic objects with ground
truthed camera motion and dense three-dimensional geometry
Scene relighting and editing for improved object insertion
Abstract. The goal of this thesis is to develop a scene relighting and object insertion pipeline using Neural Radiance Fields (NeRF) to incorporate one or more objects into an outdoor environment scene. The output is a 3D mesh that embodies decomposed bidirectional reflectance distribution function (BRDF) characteristics, which interact with varying light source positions and strengths. To achieve this objective, the thesis is divided into two sub-tasks.
The first sub-task involves extracting visual information about the outdoor environment from a sparse set of corresponding images. A neural representation is constructed, providing a comprehensive understanding of the constituent elements, such as materials, geometry, illumination, and shadows. The second sub-task involves generating a neural representation of the inserted object using either real-world images or synthetic data.
To accomplish these objectives, the thesis draws on existing literature in computer vision and computer graphics. Different approaches are assessed to identify their advantages and disadvantages, with detailed descriptions of the chosen techniques provided, highlighting their functioning to produce the ultimate outcome.
Overall, this thesis aims to provide a framework for compositing and relighting that is grounded in NeRF and allows for the seamless integration of objects into outdoor environments. The outcome of this work has potential applications in various domains, such as visual effects, gaming, and virtual reality
Joint Material and Illumination Estimation from Photo Sets in the Wild
Faithful manipulation of shape, material, and illumination in 2D Internet
images would greatly benefit from a reliable factorization of appearance into
material (i.e., diffuse and specular) and illumination (i.e., environment
maps). On the one hand, current methods that produce very high fidelity
results, typically require controlled settings, expensive devices, or
significant manual effort. To the other hand, methods that are automatic and
work on 'in the wild' Internet images, often extract only low-frequency
lighting or diffuse materials. In this work, we propose to make use of a set of
photographs in order to jointly estimate the non-diffuse materials and sharp
lighting in an uncontrolled setting. Our key observation is that seeing
multiple instances of the same material under different illumination (i.e.,
environment), and different materials under the same illumination provide
valuable constraints that can be exploited to yield a high-quality solution
(i.e., specular materials and environment illumination) for all the observed
materials and environments. Similar constraints also arise when observing
multiple materials in a single environment, or a single material across
multiple environments. The core of this approach is an optimization procedure
that uses two neural networks that are trained on synthetic images to predict
good gradients in parametric space given observation of reflected light. We
evaluate our method on a range of synthetic and real examples to generate
high-quality estimates, qualitatively compare our results against
state-of-the-art alternatives via a user study, and demonstrate
photo-consistent image manipulation that is otherwise very challenging to
achieve
Survey on Controlable Image Synthesis with Deep Learning
Image synthesis has attracted emerging research interests in academic and
industry communities. Deep learning technologies especially the generative
models greatly inspired controllable image synthesis approaches and
applications, which aim to generate particular visual contents with latent
prompts. In order to further investigate low-level controllable image synthesis
problem which is crucial for fine image rendering and editing tasks, we present
a survey of some recent works on 3D controllable image synthesis using deep
learning. We first introduce the datasets and evaluation indicators for 3D
controllable image synthesis. Then, we review the state-of-the-art research for
geometrically controllable image synthesis in two aspects: 1)
Viewpoint/pose-controllable image synthesis; 2) Structure/shape-controllable
image synthesis. Furthermore, the photometrically controllable image synthesis
approaches are also reviewed for 3D re-lighting researches. While the emphasis
is on 3D controllable image synthesis algorithms, the related applications,
products and resources are also briefly summarized for practitioners.Comment: 19 pages, 17 figure
{D-NeRF}: {N}eural Radiance Fields for Dynamic Scenes
Trabajo presentado en la IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), celebrada de forma virtual desde Nashville, TN (Estados Unidos), del 20 al 25 de junio de 2021Neural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.Peer reviewe
Leaming Visual Appearance: Perception, Modeling and Editing.
La apariencia visual determina como entendemos un objecto o imagen, y, por tanto, es un aspecto fundamental en la creación de contenido digital. Es un término general, englobando otros como la apariencia de los materiales, definida como la impresión que tenemos de un material, y la cual supone una interacción física entre luz y materia, y como nuestro sistema visual es capaz de percibirla. Sin embargo, modelar computacionalmente el comportamiento de nuestro sistema visual es una tarea difícil, entre otros motivos porque no existe una teoría definitiva y unificada sobre la percepción visual humana. Además, aunque hemos desarrollado algoritmos capaces de modelar fehacientemente la interacción entre luz y materia, existe una desconexión entre los parámetros físicos que usan estos algoritmos, y los parámetros perceptuales que el sistema visual humano entiende. Esto hace que manipular estas representaciones físicas, y sus interacciones, sea una tarea tediosa y costosa, incluso para usuarios expertos. Esta tesis busca mejorar nuestra comprensión de la percepción de la apariencia de materiales y usar dicho conocimiento para mejorar los algoritmos existentes para la generación de contenido visual. Específicamente, la tesis tiene contribuciones en tres áreas: proponiendo nuevos modelos computacionales para medir la similitud de apariencia; investigando la interacción entre iluminación y geometría; y desarrollando aplicaciones intuitivas para la manipulación de apariencia, en concreto, para el re-iluminado de humanos y para editar la apariencia de materiales.Una primera parte de la tesis explora métodos para medir la similaridad de apariencia. Ser capaces de medir cómo de similares son dos materiales, o imágenes, es un problema clásico en campos de la computación visual como visión por computador o informática gráfica. Abordamos primero el problema de similaridad en la apariencia de materiales. Proponemos un método basado en deep learning que combina imágenes con juicios subjetivos sobre la similitud de materiales, recogidos mediante estudios de usuario. Por otro lado, se explora el problema de la similaridad entre iconos. En este segundo caso, se hace uso de redes neuronales siamesas, y el estilo y la identidad que dan los artistas juega un papel clave en dicha medida de similaridad. La segunda parte avanza en la comprensión de cómo los factores de confusión (confounding factors) afectan a nuestra percepción de la apariencia de los materiales. Dos factores de confusión claves son la geometría de los objetos y la iluminación de la escena. Comenzamos investigando el efecto de dichos factores a la hora de reconocer los materiales a través de diversos experimentos y estudios estadísticos. También investigamos el efecto del movimiento del objeto en la percepción de la apariencia de materiales.En la tercera parte exploramos aplicaciones intuitivas para la manipulación de la apariencia visual. Primero, abordamos el problema de la re-iluminación de humanos. Proponemos una nueva formulación del problema, y basándonos en ella, se diseña y entrena un modelo basado en redes neuronales profundas para re-iluminar una escena. Por último, abordamos el problema de la edición intuitiva de materiales. Para ello, recopilamos juicios humanos sobre la percepción de diferentes atributos y presentamos un modelo, basado en redes neuronales profundas, capaz de editar materiales de forma realista simplemente variando el valor de los atributos recogidos.<br /
FaceLit: Neural 3D Relightable Faces
We propose a generative framework, FaceLit, capable of generating a 3D face
that can be rendered at various user-defined lighting conditions and views,
learned purely from 2D images in-the-wild without any manual annotation. Unlike
existing works that require careful capture setup or human labor, we rely on
off-the-shelf pose and illumination estimators. With these estimates, we
incorporate the Phong reflectance model in the neural volume rendering
framework. Our model learns to generate shape and material properties of a face
such that, when rendered according to the natural statistics of pose and
illumination, produces photorealistic face images with multiview 3D and
illumination consistency. Our method enables photorealistic generation of faces
with explicit illumination and view controls on multiple datasets - FFHQ,
MetFaces and CelebA-HQ. We show state-of-the-art photorealism among 3D aware
GANs on FFHQ dataset achieving an FID score of 3.5.Comment: CVPR 202
- …