2,008 research outputs found
Joint Material and Illumination Estimation from Photo Sets in the Wild
Faithful manipulation of shape, material, and illumination in 2D Internet
images would greatly benefit from a reliable factorization of appearance into
material (i.e., diffuse and specular) and illumination (i.e., environment
maps). On the one hand, current methods that produce very high fidelity
results, typically require controlled settings, expensive devices, or
significant manual effort. To the other hand, methods that are automatic and
work on 'in the wild' Internet images, often extract only low-frequency
lighting or diffuse materials. In this work, we propose to make use of a set of
photographs in order to jointly estimate the non-diffuse materials and sharp
lighting in an uncontrolled setting. Our key observation is that seeing
multiple instances of the same material under different illumination (i.e.,
environment), and different materials under the same illumination provide
valuable constraints that can be exploited to yield a high-quality solution
(i.e., specular materials and environment illumination) for all the observed
materials and environments. Similar constraints also arise when observing
multiple materials in a single environment, or a single material across
multiple environments. The core of this approach is an optimization procedure
that uses two neural networks that are trained on synthetic images to predict
good gradients in parametric space given observation of reflected light. We
evaluate our method on a range of synthetic and real examples to generate
high-quality estimates, qualitatively compare our results against
state-of-the-art alternatives via a user study, and demonstrate
photo-consistent image manipulation that is otherwise very challenging to
achieve
Master Texture Space: An Efficient Encoding for Projectively Mapped Objects
Projectively textured models are used in an increasingly large number of applicationsthat dynamically combine images with a simple geometric surface in a viewpoint dependentway. These models can provide visual fidelity while retaining the effects affordedby geometric approximation such as shadow casting and accurate perspective distortion.However, the number of stored views can be quite large and novel views must be synthesizedduring the rendering process because no single view may correctly texture the entireobject surface. This work introduces the Master Texture encoding and demonstrates thatthe encoding increases the utility of projectively textured objects by reducing render-timeoperations. Encoding involves three steps; 1) all image regions that correspond to the samegeometric mesh element are extracted and warped to a facet of uniform size and shape,2) an efficient packing of these facets into a new Master Texture image is computed, and3) the visibility of each pixel in the new Master Texture data is guaranteed using a simplealgorithm to discard occluded pixels in each view. Because the encoding implicitly representsthe multi-view geometry of the multiple images, a single texture mesh is sufficientto render the view-dependent model. More importantly, every Master Texture image cancorrectly texture the entire surface of the object, removing expensive computations suchas visibility analysis from the rendering algorithm. A benefit of this encoding is the supportfor pixel-wise view synthesis. The utility of pixel-wise view synthesis is demonstratedwith a real-time Master Texture encoded VDTM application. Pixel-wise synthesis is alsodemonstrated with an algorithm that distills a set of Master Texture images to a singleview-independent Master Texture image
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
We propose a real-time RGB-based pipeline for object detection and 6D pose
estimation. Our novel 3D orientation estimation is based on a variant of the
Denoising Autoencoder that is trained on simulated views of a 3D model using
Domain Randomization. This so-called Augmented Autoencoder has several
advantages over existing methods: It does not require real, pose-annotated
training data, generalizes to various test sensors and inherently handles
object and view symmetries. Instead of learning an explicit mapping from input
images to object poses, it provides an implicit representation of object
orientations defined by samples in a latent space. Our pipeline achieves
state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D
domain. We also evaluate on the LineMOD dataset where we can compete with other
synthetically trained approaches. We further increase performance by correcting
3D orientation estimates to account for perspective errors when the object
deviates from the image center and show extended results.Comment: Code available at: https://github.com/DLR-RM/AugmentedAutoencode
- …