106 research outputs found

    Self-supervised Outdoor Scene Relighting

    Get PDF
    Outdoor scene relighting is a challenging problem that requires good understanding of the scene geometry, illumination and albedo. Current techniques are completely supervised, requiring high quality synthetic renderings to train a solution. Such renderings are synthesized using priors learned from limited data. In contrast, we propose a self-supervised approach for relighting. Our approach is trained only on corpora of images collected from the internet without any user-supervision. This virtually endless source of training data allows training a general relighting solution. Our approach first decomposes an image into its albedo, geometry and illumination. A novel relighting is then produced by modifying the illumination parameters. Our solution capture shadow using a dedicated shadow prediction map, and does not rely on accurate geometry estimation. We evaluate our technique subjectively and objectively using a new dataset with ground-truth relighting. Results show the ability of our technique to produce photo-realistic and physically plausible results, that generalizes to unseen scenes.Comment: Published in ECCV '20, http://gvv.mpi-inf.mpg.de/projects/SelfRelight

    Physics-based vision meets deep learning

    Get PDF
    Physics-based vision explores computer vision and graphics problems by applying methods based upon physical models. On the other hand, deep learning is a learning-based technique, where a substantial number of observations are used to train an expressive yet unexplainable neural network model. In this thesis, we propose the concept of a model-based decoder, which is an unlearnable and differentiable neural layer being designed according to a physics-based model. Constructing neural networks with such model-based decoders afford the model strong learning capability as well as the potential to respect the underlying physics. We start the study by developing a toolbox of differentiable photometric layers ported from classical photometric techniques. This enables us to perform the image formation process given geometry, illumination and reflectance function. Applying these differentiable photometric layers into a bidirectional reflectance distribution function (BRDF) estimation network training, we show the network could be trained in a self-supervised manner without the knowledge of ground truth BRDFs. Next, in a more general setting, we attempt to solve inverse rendering problems in a self-supervised fashion by making use of model-based decoders. Here, an inverse rendering network decomposes a single image into normal and diffuse albedo map and illumination. In order to achieve self-supervised training, we draw inspiration from multiview stereo (MVS) and employ a Lambertian model and a cross-projection MVS model to generate model-based supervisory signals. Finally, we seek potential hybrids of a neural decoder and a model-based decoder on a pair of practical problems: image relighting, and fine-scale depth prediction and novel view synthesis. In contrast to using model-based decoders to only supervise the training, the model-based decoder in our hybrid model serves to disentangle the intricate problem into a set of physically connected solvable ones. In practice, we develop a hybrid model that can estimate a fine-scale depth map and generate novel view synthesis from a single image by using a physical subnet to combine results from an inverse rendering network with a monodepth prediction network. As for neural image relighting, we propose another hybrid model using a Lambertian renderer to generate initial estimates of relighting results followed by a neural renderer performing corrections over deficits in initial renderings. We demonstrate the model-based decoder can significantly improve the quality of results and relax the demands for labelled data

    FastHuman: Reconstructing High-Quality Clothed Human in Minutes

    Full text link
    We propose an approach for optimizing high-quality clothed human body shapes in minutes, using multi-view posed images. While traditional neural rendering methods struggle to disentangle geometry and appearance using only rendering loss, and are computationally intensive, our method uses a mesh-based patch warping technique to ensure multi-view photometric consistency, and sphere harmonics (SH) illumination to refine geometric details efficiently. We employ oriented point clouds' shape representation and SH shading, which significantly reduces optimization and rendering times compared to implicit methods. Our approach has demonstrated promising results on both synthetic and real-world datasets, making it an effective solution for rapidly generating high-quality human body shapes. Project page \href{https://l1346792580123.github.io/nccsfs/}{https://l1346792580123.github.io/nccsfs/}Comment: International Conference on 3D Vision, 3DV 202

    Automatic construction of robust spherical harmonic subspaces

    Get PDF
    In this paper we propose a method to automatically recover a class specific low dimensional spherical harmonic basis from a set of in-the-wild facial images. We combine existing techniques for uncalibrated photometric stereo and low rank matrix decompositions in order to robustly recover a combined model of shape and identity. We build this basis without aid from a 3D model and show how it can be combined with recent efficient sparse facial feature localisation techniques to recover dense 3D facial shape. Unlike previous works in the area, our method is very efficient and is an order of magnitude faster to train, taking only a few minutes to build a model with over 2000 images. Furthermore, it can be used for real-time recovery of facial shape

    Automatic construction of robust spherical harmonic subspaces

    Get PDF
    In this paper we propose a method to automatically recover a class specific low dimensional spherical harmonic basis from a set of in-the-wild facial images. We combine existing techniques for uncalibrated photometric stereo and low rank matrix decompositions in order to robustly recover a combined model of shape and identity. We build this basis without aid from a 3D model and show how it can be combined with recent efficient sparse facial feature localisation techniques to recover dense 3D facial shape. Unlike previous works in the area, our method is very efficient and is an order of magnitude faster to train, taking only a few minutes to build a model with over 2000 images. Furthermore, it can be used for real-time recovery of facial shape

    3D Face Recognition

    Get PDF
    corecore