72 research outputs found
Ear-to-ear Capture of Facial Intrinsics
We present a practical approach to capturing ear-to-ear face models
comprising both 3D meshes and intrinsic textures (i.e. diffuse and specular
albedo). Our approach is a hybrid of geometric and photometric methods and
requires no geometric calibration. Photometric measurements made in a
lightstage are used to estimate view dependent high resolution normal maps. We
overcome the problem of having a single photometric viewpoint by capturing in
multiple poses. We use uncalibrated multiview stereo to estimate a coarse base
mesh to which the photometric views are registered. We propose a novel approach
to robustly stitching surface normal and intrinsic texture data into a
seamless, complete and highly detailed face model. The resulting relightable
models provide photorealistic renderings in any view
SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild
We present SfSNet, an end-to-end learning framework for producing an accurate
decomposition of an unconstrained human face image into shape, reflectance and
illuminance. SfSNet is designed to reflect a physical lambertian rendering
model. SfSNet learns from a mixture of labeled synthetic and unlabeled real
world images. This allows the network to capture low frequency variations from
synthetic and high frequency details from real images through the photometric
reconstruction loss. SfSNet consists of a new decomposition architecture with
residual blocks that learns a complete separation of albedo and normal. This is
used along with the original image to predict lighting. SfSNet produces
significantly better quantitative and qualitative results than state-of-the-art
methods for inverse rendering and independent normal and illumination
estimation.Comment: Accepted to CVPR 2018 (Spotlight
A Morphable Face Albedo Model
In this paper, we bring together two divergent strands of research:
photometric face capture and statistical 3D face appearance modelling. We
propose a novel lightstage capture and processing pipeline for acquiring
ear-to-ear, truly intrinsic diffuse and specular albedo maps that fully factor
out the effects of illumination, camera and geometry. Using this pipeline, we
capture a dataset of 50 scans and combine them with the only existing publicly
available albedo dataset (3DRFE) of 23 scans. This allows us to build the first
morphable face albedo model. We believe this is the first statistical analysis
of the variability of facial specular albedo maps. This model can be used as a
plug in replacement for the texture model of the Basel Face Model (BFM) or
FLAME and we make the model publicly available. We ensure careful spectral
calibration such that our model is built in a linear sRGB space, suitable for
inverse rendering of images taken by typical cameras. We demonstrate our model
in a state of the art analysis-by-synthesis 3DMM fitting pipeline, are the
first to integrate specular map estimation and outperform the BFM in albedo
reconstruction.Comment: CVPR 202
Coping with Data Scarcity in Deep Learning and Applications for Social Good
The recent years are experiencing an extremely fast evolution of the Computer Vision and
Machine Learning fields: several application domains benefit from the newly developed
technologies and industries are investing a growing amount of money in Artificial Intelligence.
Convolutional Neural Networks and Deep Learning substantially contributed to the rise and
the diffusion of AI-based solutions, creating the potential for many disruptive new businesses.
The effectiveness of Deep Learning models is grounded by the availability of a huge
amount of training data. Unfortunately, data collection and labeling is an extremely expensive
task in terms of both time and costs; moreover, it frequently requires the collaboration of
domain experts.
In the first part of the thesis, I will investigate some methods for reducing the cost
of data acquisition for Deep Learning applications in the relatively constrained industrial
scenarios related to visual inspection. I will primarily assess the effectiveness of Deep Neural
Networks in comparison with several classical Machine Learning algorithms requiring a
smaller amount of data to be trained. Hereafter, I will introduce a hardware-based data
augmentation approach, which leads to a considerable performance boost taking advantage of
a novel illumination setup designed for this purpose. Finally, I will investigate the situation in
which acquiring a sufficient number of training samples is not possible, in particular the most
extreme situation: zero-shot learning (ZSL), which is the problem of multi-class classification
when no training data is available for some of the classes. Visual features designed for image
classification and trained offline have been shown to be useful for ZSL to generalize towards
classes not seen during training. Nevertheless, I will show that recognition performances
on unseen classes can be sharply improved by learning ad hoc semantic embedding (the
pre-defined list of present and absent attributes that represent a class) and visual features, to
increase the correlation between the two geometrical spaces and ease the metric learning
process for ZSL.
In the second part of the thesis, I will present some successful applications of state-of-the-
art Computer Vision, Data Analysis and Artificial Intelligence methods. I will illustrate
some solutions developed during the 2020 Coronavirus Pandemic for controlling the disease
vii
evolution and for reducing virus spreading. I will describe the first publicly available
dataset for the analysis of face-touching behavior that we annotated and distributed, and
I will illustrate an extensive evaluation of several computer vision methods applied to the
produced dataset. Moreover, I will describe the privacy-preserving solution we developed
for estimating the \u201cSocial Distance\u201d and its violations, given a single uncalibrated image
in unconstrained scenarios. I will conclude the thesis with a Computer Vision solution
developed in collaboration with the Egyptian Museum of Turin for digitally unwrapping
mummies analyzing their CT scan, to support the archaeologists during mummy analysis
and avoiding the devastating and irreversible process of physically unwrapping the bandages
for removing amulets and jewels from the body
- …