689 research outputs found
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Intrinsic image decomposition is a challenging, long-standing computer vision
problem for which ground truth data is very difficult to acquire. We explore
the use of synthetic data for training CNN-based intrinsic image decomposition
models, then applying these learned models to real-world images. To that end,
we present \ICG, a new, large-scale dataset of physically-based rendered images
of scenes with full ground truth decompositions. The rendering process we use
is carefully designed to yield high-quality, realistic images, which we find to
be crucial for this problem domain. We also propose a new end-to-end training
method that learns better decompositions by leveraging \ICG, and optionally IIW
and SAW, two recent datasets of sparse annotations on real-world images.
Surprisingly, we find that a decomposition network trained solely on our
synthetic data outperforms the state-of-the-art on both IIW and SAW, and
performance improves even further when IIW and SAW data is added during
training. Our work demonstrates the suprising effectiveness of
carefully-rendered synthetic data for the intrinsic images task.Comment: Paper for 'CGIntrinsics: Better Intrinsic Image Decomposition through
Physically-Based Rendering' published in ECCV, 201
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice
Dense prediction tasks typically employ encoder-decoder architectures, but
the prevalent convolutions in the decoder are not image-adaptive and can lead
to boundary artifacts. Different generalized convolution operations have been
introduced to counteract this. We go beyond these by leveraging guidance data
to redefine their inherent notion of proximity. Our proposed network layer
builds on the permutohedral lattice, which performs sparse convolutions in a
high-dimensional space allowing for powerful non-local operations despite small
filters. Multiple features with different characteristics span this
permutohedral space. In contrast to prior work, we learn these features in a
task-specific manner by generalizing the basic permutohedral operations to
learnt feature representations. As the resulting objective is complex, a
carefully designed framework and learning procedure are introduced, yielding
rich feature embeddings in practice. We demonstrate the general applicability
of our approach in different joint upsampling tasks. When adding our network
layer to state-of-the-art networks for optical flow and semantic segmentation,
boundary artifacts are removed and the accuracy is improved.Comment: To appear at GCPR 201
Decision Models and Technology Can Help Psychiatry Develop Biomarkers
Why is psychiatry unable to define clinically useful biomarkers? We explore this question from the vantage of data and decision science and consider biomarkers as a form of phenotypic data that resolves a well-defined clinical decision. We introduce a framework that systematizes different forms of phenotypic data and further introduce the concept of decision model to describe the strategies a clinician uses to seek out, combine, and act on clinical data. Though many medical specialties rely on quantitative clinical data and operationalized decision models, we observe that, in psychiatry, clinical data are gathered and used in idiosyncratic decision models that exist solely in the clinician's mind and therefore are outside empirical evaluation. This, we argue, is a fundamental reason why psychiatry is unable to define clinically useful biomarkers: because psychiatry does not currently quantify clinical data, decision models cannot be operationalized and, in the absence of an operationalized decision model, it is impossible to define how a biomarker might be of use. Here, psychiatry might benefit from digital technologies that have recently emerged specifically to quantify clinically relevant facets of human behavior. We propose that digital tools might help psychiatry in two ways: first, by quantifying data already present in the standard clinical interaction and by allowing decision models to be operationalized and evaluated; second, by testing whether new forms of data might have value within an operationalized decision model. We reference successes from other medical specialties to illustrate how quantitative data and operationalized decision models improve patient care
AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation
This paper targets on learning-based novel view synthesis from a single or
limited 2D images without the pose supervision. In the viewer-centered
coordinates, we construct an end-to-end trainable conditional variational
framework to disentangle the unsupervisely learned relative-pose/rotation and
implicit global 3D representation (shape, texture and the origin of
viewer-centered coordinates, etc.). The global appearance of the 3D object is
given by several appearance-describing images taken from any number of
viewpoints. Our spatial correlation module extracts a global 3D representation
from the appearance-describing images in a permutation invariant manner. Our
system can achieve implicitly 3D understanding without explicitly 3D
reconstruction. With an unsupervisely learned viewer-centered
relative-pose/rotation code, the decoder can hallucinate the novel view
continuously by sampling the relative-pose in a prior distribution. In various
applications, we demonstrate that our model can achieve comparable or even
better results than pose/3D model-supervised learning-based novel view
synthesis (NVS) methods with any number of input views.Comment: ECCV 202
Self-supervised Outdoor Scene Relighting
Outdoor scene relighting is a challenging problem that requires good
understanding of the scene geometry, illumination and albedo. Current
techniques are completely supervised, requiring high quality synthetic
renderings to train a solution. Such renderings are synthesized using priors
learned from limited data. In contrast, we propose a self-supervised approach
for relighting. Our approach is trained only on corpora of images collected
from the internet without any user-supervision. This virtually endless source
of training data allows training a general relighting solution. Our approach
first decomposes an image into its albedo, geometry and illumination. A novel
relighting is then produced by modifying the illumination parameters. Our
solution capture shadow using a dedicated shadow prediction map, and does not
rely on accurate geometry estimation. We evaluate our technique subjectively
and objectively using a new dataset with ground-truth relighting. Results show
the ability of our technique to produce photo-realistic and physically
plausible results, that generalizes to unseen scenes.Comment: Published in ECCV '20,
http://gvv.mpi-inf.mpg.de/projects/SelfRelight
Occlusion-aware 3D Morphable Models and an Illumination Prior for Face Image Analysis
Faces in natural images are often occluded by a variety of objects. We propose a fully automated, probabilistic and occlusion-aware 3D morphable face model adaptation framework following an analysis-by-synthesis setup. The key idea is to segment the image into regions explained by separate models. Our framework includes a 3D morphable face model, a prototype-based beard model and a simple model for occlusions and background regions. The segmentation and all the model parameters have to be inferred from the single target image. Face model adaptation and segmentation are solved jointly using an expectation-maximization-like procedure. During the E-step, we update the segmentation and in the M-step the face model parameters are updated. For face model adaptation we apply a stochastic sampling strategy based on the Metropolis-Hastings algorithm. For segmentation, we apply loopy belief propagation for inference in a Markov random field. Illumination estimation is critical for occlusion handling. Our combined segmentation and model adaptation needs a proper initialization of the illumination parameters. We propose a RANSAC-based robust illumination estimation technique. By applying this method to a large face image database we obtain a first empirical distribution of real-world illumination conditions. The obtained empirical distribution is made publicly available and can be used as prior in probabilistic frameworks, for regularization or to synthesize data for deep learning methods
Polarimetric Multi-View Inverse Rendering
A polarization camera has great potential for 3D reconstruction since the
angle of polarization (AoP) of reflected light is related to an object's
surface normal. In this paper, we propose a novel 3D reconstruction method
called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that
effectively exploits geometric, photometric, and polarimetric cues extracted
from input multi-view color polarization images. We first estimate camera poses
and an initial 3D model by geometric reconstruction with a standard
structure-from-motion and multi-view stereo pipeline. We then refine the
initial model by optimizing photometric and polarimetric rendering errors using
multi-view RGB and AoP images, where we propose a novel polarimetric rendering
cost function that enables us to effectively constrain each estimated surface
vertex's normal while considering four possible ambiguous azimuth angles
revealed from the AoP measurement. Experimental results using both synthetic
and real data demonstrate that our Polarimetric MVIR can reconstruct a detailed
3D shape without assuming a specific polarized reflection depending on the
material.Comment: Paper accepted in ECCV 202
- …