42,640 research outputs found
Age Progression and Regression with Spatial Attention Modules
Age progression and regression refers to aesthetically render-ing a given
face image to present effects of face aging and rejuvenation, respectively.
Although numerous studies have been conducted in this topic, there are two
major problems: 1) multiple models are usually trained to simulate different
age mappings, and 2) the photo-realism of generated face images is heavily
influenced by the variation of training images in terms of pose, illumination,
and background. To address these issues, in this paper, we propose a framework
based on conditional Generative Adversarial Networks (cGANs) to achieve age
progression and regression simultaneously. Particularly, since face aging and
rejuvenation are largely different in terms of image translation patterns, we
model these two processes using two separate generators, each dedicated to one
age changing process. In addition, we exploit spatial attention mechanisms to
limit image modifications to regions closely related to age changes, so that
images with high visual fidelity could be synthesized for in-the-wild cases.
Experiments on multiple datasets demonstrate the ability of our model in
synthesizing lifelike face images at desired ages with personalized features
well preserved, and keeping age-irrelevant regions unchanged
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
Recommended from our members
E-government adoption in Qatar: An investigation of the citizens' perspective
Electronic government (e-government) initiatives are in their early stages in many developing countries and faced
with various issues pertaining to their implementation, adoption and diffusion. Like many other developing
countries, the e-government initiative in the state of Qatar has faced a number of challenges since its inception in
2000. Using a survey based study this paper describes citizensâ behavioural intention and adoption in terms of
applying and utilising the Unified Theory of Acceptance and Use of technology (UTAUT) model to explore the
adoption and diffusion of e-government services in the state of Qatar. A regression analysis was conducted to
examine the influence of e-government adoption factors and the empirical data revealed that performance
expectancy, effort expectancy, and social influences determine citizensâ behavioural intention towards e-government.
Moreover, facilitating conditions and behavioural intention were found to determine citizensâ use of e-government
services in the state of Qatar. Implications for practice and research are discussed
FaceFilter: Audio-visual speech separation using still images
The objective of this paper is to separate a target speaker's speech from a
mixture of two speakers using a deep audio-visual speech separation network.
Unlike previous works that used lip movement on video clips or pre-enrolled
speaker information as an auxiliary conditional feature, we use a single face
image of the target speaker. In this task, the conditional feature is obtained
from facial appearance in cross-modal biometric task, where audio and visual
identity representations are shared in latent space. Learnt identities from
facial images enforce the network to isolate matched speakers and extract the
voices from mixed speech. It solves the permutation problem caused by swapped
channel outputs, frequently occurred in speech separation tasks. The proposed
method is far more practical than video-based speech separation since user
profile images are readily available on many platforms. Also, unlike
speaker-aware separation methods, it is applicable on separation with unseen
speakers who have never been enrolled before. We show strong qualitative and
quantitative results on challenging real-world examples.Comment: Under submission as a conference paper. Video examples:
https://youtu.be/ku9xoLh62
Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model
Pedestrian attribute inference is a demanding problem in visual surveillance
that can facilitate person retrieval, search and indexing. To exploit semantic
relations between attributes, recent research treats it as a multi-label image
classification task. The visual cues hinting at attributes can be strongly
localized and inference of person attributes such as hair, backpack, shorts,
etc., are highly dependent on the acquired view of the pedestrian. In this
paper we assert this dependence in an end-to-end learning framework and show
that a view-sensitive attribute inference is able to learn better attribute
predictions. Our proposed model jointly predicts the coarse pose (view) of the
pedestrian and learns specialized view-specific multi-label attribute
predictions. We show in an extensive evaluation on three challenging datasets
(PETA, RAP and WIDER) that our proposed end-to-end view-aware attribute
prediction model provides competitive performance and improves on the published
state-of-the-art on these datasets.Comment: accepted BMVC 201
- âŠ