20,517 research outputs found
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz
The reconstruction of dense 3D models of face geometry and appearance from a
single image is highly challenging and ill-posed. To constrain the problem,
many approaches rely on strong priors, such as parametric face models learned
from limited 3D scan data. However, prior models restrict generalization of the
true diversity in facial geometry, skin reflectance and illumination. To
alleviate this problem, we present the first approach that jointly learns 1) a
regressor for face shape, expression, reflectance and illumination on the basis
of 2) a concurrently learned parametric face model. Our multi-level face model
combines the advantage of 3D Morphable Models for regularization with the
out-of-space generalization of a learned corrective space. We train end-to-end
on in-the-wild images without dense annotations by fusing a convolutional
encoder with a differentiable expert-designed renderer and a self-supervised
training loss, both defined at multiple detail levels. Our approach compares
favorably to the state-of-the-art in terms of reconstruction quality, better
generalizes to real world faces, and runs at over 250 Hz.Comment: CVPR 2018 (Oral). Project webpage:
https://gvv.mpi-inf.mpg.de/projects/FML
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
Real-time deep hair matting on mobile devices
Augmented reality is an emerging technology in many application domains.
Among them is the beauty industry, where live virtual try-on of beauty products
is of great importance. In this paper, we address the problem of live hair
color augmentation. To achieve this goal, hair needs to be segmented quickly
and accurately. We show how a modified MobileNet CNN architecture can be used
to segment the hair in real-time. Instead of training this network using large
amounts of accurate segmentation data, which is difficult to obtain, we use
crowd sourced hair segmentation data. While such data is much simpler to
obtain, the segmentations there are noisy and coarse. Despite this, we show how
our system can produce accurate and fine-detailed hair mattes, while running at
over 30 fps on an iPad Pro tablet.Comment: 7 pages, 7 figures, submitted to CRV 201
Fingertip skin models for analysis of the haptic perception of textiles
This paper presents finite element models of the fingertip skin which have been created to simulate the contact of textile objects with the skin to gain a better understanding of the perception of textiles through the skin, the so-called hand of textiles. Many objective and subjective techniques have already been developed for analysing the hand of textiles; however, none of them provide exact overall information concerning the sensation of textiles through the skin. As the human skin is a complex heterogeneous hyperelastic body composed of many particles, some simplifications had to be made at the early stage of building the models; however, their utilitarian value was maintained. The models relate only to mechanical loading of the skin. They predict a low deformation of the fingertip skin under the pressure of virtual heterogeneous material: acrylic, coarse wool, and steel
AVEID: Automatic Video System for Measuring Engagement In Dementia
Engagement in dementia is typically measured using behavior observational
scales (BOS) that are tedious and involve intensive manual labor to annotate,
and are therefore not easily scalable. We propose AVEID, a low cost and
easy-to-use video-based engagement measurement tool to determine the engagement
level of a person with dementia (PwD) during digital interaction. We show that
the objective behavioral measures computed via AVEID correlate well with
subjective expert impressions for the popular MPES and OME BOS, confirming its
viability and effectiveness. Moreover, AVEID measures can be obtained for a
variety of engagement designs, thereby facilitating large-scale studies with
PwD populations
- …