Search CORE

18 research outputs found

SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild

Author: Castillo Carlos D.
Jacobs David
Kanazawa Angjoo
Sengupta Soumyadip
Publication venue
Publication date: 19/04/2018
Field of study

We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.Comment: Accepted to CVPR 2018 (Spotlight

arXiv.org e-Print Archive

Crossref

Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

Author: Chowdhury Sanjoy
Jacobs David
Sengupta Soumyadip
Shanmugaraja Hariharmano
Wu Jiaye
Publication venue
Publication date: 29/06/2023
Field of study

Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics. We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction. Code and Data available at: https://measuredalbedo.github.io/Comment: Accepted into ICCP202

arXiv.org e-Print Archive

$\texttt{NePhi}$ : Neural Deformation Fields for Approximately Diffeomorphic Medical Image Registration

Author: Estépar Raúl San José
Greer Hastings
Niethammer Marc
Sengupta Soumyadip
Tian Lin
Publication venue
Publication date: 13/09/2023
Field of study

This work proposes

\texttt{NePhi}

, a neural deformation model which results in approximately diffeomorphic transformations. In contrast to the predominant voxel-based approaches,

\texttt{NePhi}

represents deformations functionally which allows for memory-efficient training and inference. This is of particular importance for large volumetric registrations. Further, while medical image registration approaches representing transformation maps via multi-layer perceptrons have been proposed,

\texttt{NePhi}

facilitates both pairwise optimization-based registration

\textit{as well as}

learning-based registration via predicted or optimized global and local latent codes. Lastly, as deformation regularity is a highly desirable property for most medical image registration tasks,

\texttt{NePhi}

makes use of gradient inverse consistency regularization which empirically results in approximately diffeomorphic transformations. We show the performance of

\texttt{NePhi}

on two 2D synthetic datasets as well as on real 3D lung registration. Our results show that

\texttt{NePhi}

can achieve similar accuracies as voxel-based representations in a single-resolution registration setting while using less memory and allowing for faster instance-optimization

arXiv.org e-Print Archive

Bringing Telepresence to Every Desk

Author: Fuchs Henry
Kwon YoungJoong
Schmelzle Ryan
Sengupta Soumyadip
Wang Shengze
Wang Ziheng
Zheng Liujie
Publication venue
Publication date: 03/04/2023
Field of study

In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/

arXiv.org e-Print Archive

Universal Guidance for Diffusion Models

Author: Bansal Arpit
Chu Hong-Min
Geiping Jonas
Goldblum Micah
Goldstein Tom
Schwarzschild Avi
Sengupta Soumyadip
Publication venue
Publication date: 14/02/2023
Field of study

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion

arXiv.org e-Print Archive

rPPG-Toolbox: Deep Remote PPG Toolbox

Author: Liu Xin
McDuff Daniel
Narayanswamy Girish
Paruchuri Akshay
Patel Shwetak
Sengupta Soumyadip
Tang Jiankai
Wang Yuntao
Zhang Xiaoyu
Zhang Yuzhe
Publication venue
Publication date: 24/11/2023
Field of study

Camera-based physiological measurement is a fast growing field of computer vision. Remote photoplethysmography (rPPG) utilizes imaging devices (e.g., cameras) to measure the peripheral blood volume pulse (BVP) via photoplethysmography, and enables cardiac measurement via webcams and smartphones. However, the task is non-trivial with important pre-processing, modeling, and post-processing steps required to obtain state-of-the-art results. Replication of results and benchmarking of new models is critical for scientific progress; however, as with many other applications of deep learning, reliable codebases are not easy to find or use. We present a comprehensive toolbox, rPPG-Toolbox, that contains unsupervised and supervised rPPG models with support for public benchmark datasets, data augmentation, and systematic evaluation: \url{https://github.com/ubicomplab/rPPG-Toolbox

arXiv.org e-Print Archive

Constraints and Priors for Inverse Rendering from Limited Observations

Author: SENGUPTA SOUMYADIP
Publication venue
Publication date: 01/01/2019
Field of study

Inverse Rendering deals with recovering the underlying intrinsic components of an image, i.e. geometry, reflectance, illumination and the camera with which the image was captured. Inferring these intrinsic components of an image is a fundamental problem in Computer Vision. Solving Inverse Rendering unlocks a host of real world applications in Augmented and Virtual Reality, Robotics, Computational Photography, and gaming. Researchers have made significant progress in solving Inverse Rendering from a large number of images of an object or a scene under relatively constrained settings. However, most real life applications rely on a single or a small number of images captured in an unconstrained environment. Thus in this thesis, we explore Inverse Rendering under limited observations from unconstrained images. We consider two different approaches for solving Inverse Rendering under limited observations. First, we consider learning data-driven priors that can be used for Inverse Rendering from a single image. Our goal is to jointly learn all intrinsic components of an image, such that we can recombine them and train on unlabeled real data using self-supervised reconstruction loss. A key component that enables self-supervision is a differentiable rendering module that can combine the intrinsic components to accurately regenerate the image. We show how such a self-supervised reconstruction loss can be used for Inverse Rendering of faces. While this is relatively straightforward for faces, complex appearance effects (e.g. inter-reflections, cast-shadows, and near-field lighting) present in a scene can’t be captured with a differentiable rendering module. Thus we also propose a deep CNN based differentiable rendering module (Residual Appearance Renderer) that can capture these complex appearance effects and enable self-supervised learning. Another contribution is a novel Inverse Rendering architecture, SfSNet, that performs Inverse Rendering for faces and scenes. Second, we consider enforcing low-rank multi-view constraints in an optimization framework to enable Inverse Rendering from a few images. To this end, we propose a novel multi-view rank constraint that connects all cameras capturing all the images in a scene and is enforced to ensure accurate camera recovery. We also jointly enforce a low-rank constraint and remove ambiguity to perform accurate Uncalibrated Photometric Stereo from a few images. In these problems, we formulate a constrained low-rank optimization problem in the presence of noisy estimates and missing data. Our proposed optimization framework can handle this non-convex optimization using Alternate Direction Method of Multipliers (ADMM). Given a few images, enforcing low-rank constraints significantly improves Inverse Rendering

Digital Repository at the University of Maryland

MVPSNet: Fast Generalizable Multi-view Photometric Stereo

Author: Frahm Jan-Michael
Lichy Daniel
Perrin Pierre-Nicolas
Sengupta Soumyadip
Zhao Dongxu
Publication venue
Publication date: 18/05/2023
Field of study

We propose a fast and generalizable solution to Multi-view Photometric Stereo (MVPS), called MVPSNet. The key to our approach is a feature extraction network that effectively combines images from the same view captured under multiple lighting conditions to extract geometric features from shading cues for stereo matching. We demonstrate these features, termed `Light Aggregated Feature Maps' (LAFM), are effective for feature matching even in textureless regions, where traditional multi-view stereo methods fail. Our method produces similar reconstruction results to PS-NeRF, a state-of-the-art MVPS method that optimizes a neural network per-scene, while being 411

\times

faster (105 seconds vs. 12 hours) in inference. Additionally, we introduce a new synthetic dataset for MVPS, sMVPS, which is shown to be effective to train a generalizable MVPS method

arXiv.org e-Print Archive