18 research outputs found
SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild
We present SfSNet, an end-to-end learning framework for producing an accurate
decomposition of an unconstrained human face image into shape, reflectance and
illuminance. SfSNet is designed to reflect a physical lambertian rendering
model. SfSNet learns from a mixture of labeled synthetic and unlabeled real
world images. This allows the network to capture low frequency variations from
synthetic and high frequency details from real images through the photometric
reconstruction loss. SfSNet consists of a new decomposition architecture with
residual blocks that learns a complete separation of albedo and normal. This is
used along with the original image to predict lighting. SfSNet produces
significantly better quantitative and qualitative results than state-of-the-art
methods for inverse rendering and independent normal and illumination
estimation.Comment: Accepted to CVPR 2018 (Spotlight
Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation
Intrinsic image decomposition and inverse rendering are long-standing
problems in computer vision. To evaluate albedo recovery, most algorithms
report their quantitative performance with a mean Weighted Human Disagreement
Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative
albedo values and often fails to capture overall quality of the albedo. In
order to comprehensively evaluate albedo, we collect a new dataset, Measured
Albedo in the Wild (MAW), and propose three new metrics that complement WHDR:
intensity, chromaticity and texture metrics. We show that existing algorithms
often improve WHDR metric but perform poorly on other metrics. We then finetune
different algorithms on our MAW dataset to significantly improve the quality of
the reconstructed albedo both quantitatively and qualitatively. Since the
proposed intensity, chromaticity, and texture metrics and the WHDR are all
complementary we further introduce a relative performance measure that captures
average performance. By analysing existing algorithms we show that there is
significant room for improvement. Our dataset and evaluation metrics will
enable researchers to develop algorithms that improve albedo reconstruction.
Code and Data available at: https://measuredalbedo.github.io/Comment: Accepted into ICCP202
: Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration
This work proposes , a neural deformation model which results
in approximately diffeomorphic transformations. In contrast to the predominant
voxel-based approaches, represents deformations functionally
which allows for memory-efficient training and inference. This is of particular
importance for large volumetric registrations. Further, while medical image
registration approaches representing transformation maps via multi-layer
perceptrons have been proposed, facilitates both pairwise
optimization-based registration learning-based
registration via predicted or optimized global and local latent codes. Lastly,
as deformation regularity is a highly desirable property for most medical image
registration tasks, makes use of gradient inverse consistency
regularization which empirically results in approximately diffeomorphic
transformations. We show the performance of on two 2D
synthetic datasets as well as on real 3D lung registration. Our results show
that can achieve similar accuracies as voxel-based
representations in a single-resolution registration setting while using less
memory and allowing for faster instance-optimization
Bringing Telepresence to Every Desk
In this paper, we work to bring telepresence to every desktop. Unlike
commercial systems, personal 3D video conferencing systems must render
high-quality videos while remaining financially and computationally viable for
the average consumer. To this end, we introduce a capturing and rendering
system that only requires 4 consumer-grade RGBD cameras and synthesizes
high-quality free-viewpoint videos of users as well as their environments.
Experimental results show that our system renders high-quality free-viewpoint
videos without using object templates or heavy pre-processing. While not
real-time, our system is fast and does not require per-video optimizations.
Moreover, our system is robust to complex hand gestures and clothing, and it
can generalize to new users. This work provides a strong basis for further
optimization, and it will help bring telepresence to every desk in the near
future. The code and dataset will be made available on our website
https://mcmvmc.github.io/PersonalTelepresence/
Universal Guidance for Diffusion Models
Typical diffusion models are trained to accept a particular form of
conditioning, most commonly text, and cannot be conditioned on other modalities
without retraining. In this work, we propose a universal guidance algorithm
that enables diffusion models to be controlled by arbitrary guidance modalities
without the need to retrain any use-specific components. We show that our
algorithm successfully generates quality images with guidance functions
including segmentation, face recognition, object detection, and classifier
signals. Code is available at
https://github.com/arpitbansal297/Universal-Guided-Diffusion
rPPG-Toolbox: Deep Remote PPG Toolbox
Camera-based physiological measurement is a fast growing field of computer
vision. Remote photoplethysmography (rPPG) utilizes imaging devices (e.g.,
cameras) to measure the peripheral blood volume pulse (BVP) via
photoplethysmography, and enables cardiac measurement via webcams and
smartphones. However, the task is non-trivial with important pre-processing,
modeling, and post-processing steps required to obtain state-of-the-art
results. Replication of results and benchmarking of new models is critical for
scientific progress; however, as with many other applications of deep learning,
reliable codebases are not easy to find or use. We present a comprehensive
toolbox, rPPG-Toolbox, that contains unsupervised and supervised rPPG models
with support for public benchmark datasets, data augmentation, and systematic
evaluation: \url{https://github.com/ubicomplab/rPPG-Toolbox
Constraints and Priors for Inverse Rendering from Limited Observations
Inverse Rendering deals with recovering the underlying intrinsic components of an image, i.e. geometry, reflectance, illumination and the camera with which the image was captured. Inferring these intrinsic components of an image is a fundamental problem in Computer Vision. Solving Inverse Rendering unlocks a host of real world applications in Augmented and Virtual Reality, Robotics, Computational Photography, and gaming. Researchers have made significant progress in solving Inverse Rendering from a large number of images of an object or a scene under relatively constrained settings. However, most real life applications rely on a single or a small number of images captured in an unconstrained environment. Thus in this thesis, we explore Inverse Rendering under limited observations from unconstrained images.
We consider two different approaches for solving Inverse Rendering under limited observations. First, we consider learning data-driven priors that can be used for Inverse Rendering from a single image. Our goal is to jointly learn all intrinsic components of an image, such that we can recombine them and train on unlabeled real data using self-supervised reconstruction loss. A key component that enables self-supervision is a differentiable rendering module that can combine the intrinsic components to accurately regenerate the image. We show how such a self-supervised reconstruction loss can be used for Inverse Rendering of faces. While this is relatively straightforward for faces, complex appearance effects (e.g. inter-reflections, cast-shadows, and near-field lighting) present in a scene canât be captured with a differentiable rendering module. Thus we also propose a deep CNN based differentiable rendering module (Residual Appearance Renderer) that can capture these complex appearance effects and enable self-supervised learning. Another contribution is a novel Inverse Rendering architecture, SfSNet, that performs Inverse Rendering for faces and scenes.
Second, we consider enforcing low-rank multi-view constraints in an optimization framework to enable Inverse Rendering from a few images. To this end, we propose a novel multi-view rank constraint that connects all cameras capturing all the images in a scene and is enforced to ensure accurate camera recovery. We also jointly enforce a low-rank constraint and remove ambiguity to perform accurate Uncalibrated Photometric Stereo from a few images. In these problems, we formulate a constrained low-rank optimization problem in the presence of noisy estimates and missing data. Our proposed optimization framework can handle this non-convex optimization using Alternate Direction Method of Multipliers (ADMM). Given a few images, enforcing low-rank constraints significantly improves Inverse Rendering
MVPSNet: Fast Generalizable Multi-view Photometric Stereo
We propose a fast and generalizable solution to Multi-view Photometric Stereo
(MVPS), called MVPSNet. The key to our approach is a feature extraction network
that effectively combines images from the same view captured under multiple
lighting conditions to extract geometric features from shading cues for stereo
matching. We demonstrate these features, termed `Light Aggregated Feature Maps'
(LAFM), are effective for feature matching even in textureless regions, where
traditional multi-view stereo methods fail. Our method produces similar
reconstruction results to PS-NeRF, a state-of-the-art MVPS method that
optimizes a neural network per-scene, while being 411 faster (105
seconds vs. 12 hours) in inference. Additionally, we introduce a new synthetic
dataset for MVPS, sMVPS, which is shown to be effective to train a
generalizable MVPS method