Search CORE

42,640 research outputs found

Age Progression and Regression with Spatial Attention Modules

Author: Li Qi
Liu Yunfan
Sun Zhenan
Publication venue
Publication date: 06/10/2019
Field of study

Age progression and regression refers to aesthetically render-ing a given face image to present effects of face aging and rejuvenation, respectively. Although numerous studies have been conducted in this topic, there are two major problems: 1) multiple models are usually trained to simulate different age mappings, and 2) the photo-realism of generated face images is heavily influenced by the variation of training images in terms of pose, illumination, and background. To address these issues, in this paper, we propose a framework based on conditional Generative Adversarial Networks (cGANs) to achieve age progression and regression simultaneously. Particularly, since face aging and rejuvenation are largely different in terms of image translation patterns, we model these two processes using two separate generators, each dedicated to one age changing process. In addition, we exploit spatial attention mechanisms to limit image modifications to regions closely related to age changes, so that images with high visual fidelity could be synthesized for in-the-wild cases. Experiments on multiple datasets demonstrate the ability of our model in synthesizing lifelike face images at desired ages with personalized features well preserved, and keeping age-irrelevant regions unchanged

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Author: Black Michael J.
Bolkart Timo
Choutas Vasileios
Ghorbani Nima
Osman Ahmed A. A.
Pavlakos Georgios
Tzionas Dimitrios
Publication venue
Publication date: 01/01/2019
Field of study

To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Recommended from our members

E-government adoption in Qatar: An investigation of the citizens' perspective

Author: Al-Shafi S
Irani Z
Lee H
Weerakkody V
Publication venue: DIGIT 2009
Publication date: 01/01/2009
Field of study

Electronic government (e-government) initiatives are in their early stages in many developing countries and faced with various issues pertaining to their implementation, adoption and diffusion. Like many other developing countries, the e-government initiative in the state of Qatar has faced a number of challenges since its inception in 2000. Using a survey based study this paper describes citizens‟ behavioural intention and adoption in terms of applying and utilising the Unified Theory of Acceptance and Use of technology (UTAUT) model to explore the adoption and diffusion of e-government services in the state of Qatar. A regression analysis was conducted to examine the influence of e-government adoption factors and the empirical data revealed that performance expectancy, effort expectancy, and social influences determine citizens‟ behavioural intention towards e-government. Moreover, facilitating conditions and behavioural intention were found to determine citizens‟ use of e-government services in the state of Qatar. Implications for practice and research are discussed

Brunel University Research Archive

FaceFilter: Audio-visual speech separation using still images

Author: Choe Soyeon
Chung Joon Son
Chung Soo-Whan
Kang Hong-Goo
Publication venue: 'International Speech Communication Association'
Publication date: 14/05/2020
Field of study

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement on video clips or pre-enrolled speaker information as an auxiliary conditional feature, we use a single face image of the target speaker. In this task, the conditional feature is obtained from facial appearance in cross-modal biometric task, where audio and visual identity representations are shared in latent space. Learnt identities from facial images enforce the network to isolate matched speakers and extract the voices from mixed speech. It solves the permutation problem caused by swapped channel outputs, frequently occurred in speech separation tasks. The proposed method is far more practical than video-based speech separation since user profile images are readily available on many platforms. Also, unlike speaker-aware separation methods, it is applicable on separation with unseen speakers who have never been enrolled before. We show strong qualitative and quantitative results on challenging real-world examples.Comment: Under submission as a conference paper. Video examples: https://youtu.be/ku9xoLh62

arXiv.org e-Print Archive

Crossref

Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model

Author: Sarfraz M. Saquib
Schumann Arne
Stiefelhagen Rainer
Wang Yan
Publication venue
Publication date: 01/01/2017
Field of study

Pedestrian attribute inference is a demanding problem in visual surveillance that can facilitate person retrieval, search and indexing. To exploit semantic relations between attributes, recent research treats it as a multi-label image classification task. The visual cues hinting at attributes can be strongly localized and inference of person attributes such as hair, backpack, shorts, etc., are highly dependent on the acquired view of the pedestrian. In this paper we assert this dependence in an end-to-end learning framework and show that a view-sensitive attribute inference is able to learn better attribute predictions. Our proposed model jointly predicts the coarse pose (view) of the pedestrian and learns specialized view-specific multi-label attribute predictions. We show in an extensive evaluation on three challenging datasets (PETA, RAP and WIDER) that our proposed end-to-end view-aware attribute prediction model provides competitive performance and improves on the published state-of-the-art on these datasets.Comment: accepted BMVC 201

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints