Search CORE

437,293 research outputs found

Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis

Author: Bozorgtabar Behzad
Ekenel Hazim Kemal
Rad Mohammad Saeed
Thiran Jean-Philippe
Publication venue
Publication date: 14/04/2019
Field of study

Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain's characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.Comment: 8 pages, 8 figures, 5 tables, accepted by FG 2019. arXiv admin note: substantial text overlap with arXiv:1905.0028

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Single Image Super-Resolution Using Multi-Scale Deep Encoder-Decoder with Phase Congruency Edge Map Guidance

Author: Bergstra
Bevilacqua
Choi
Dong
Dong
Gu
Han
Haris
He
Heng Liu
Huang
Jia
Johnson
Jungong Han
Kim
Kovesi
Kovesi
Li
Li
Liang
Ling Shao
Long
Ma
Mallat
Martin
Martull
Pan
Ren
Russakovsky
Scharstein
Shi
Shudong Hou
Timofte
Wang
Wang
Xie
Xie
Yang
Yang
Yuezhong Chu
Zeiler
Zeyde
Zhang
Zhang
Zhang
Zhu
Zilin Fu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This paper presents an end-to-end multi-scale deep encoder (convolution) and decoder (deconvolution) network for single image super-resolution (SISR) guided by phase congruency (PC) edge map. Our system starts by a single scale symmetrical encoder-decoder structure for SISR, which is extended to a multi-scale model by integrating wavelet multi-resolution analysis into our network. The new multi-scale deep learning system allows the low resolution (LR) input and its PC edge map to be combined so as to precisely predict the multi-scale super-resolved edge details with the guidance of the high-resolution (HR) PC edge map. In this way, the proposed deep model takes both the reconstruction of image pixels’ intensities and the recovery of multi-scale edge details into consideration under the same framework. We evaluate the proposed model on benchmark datasets of different data scenarios, such as Set14 and BSD100 - natural images, Middlebury and New Tsukuba - depth images. The evaluations based on both PSNR and visual perception reveal that the proposed model is superior to the state-of-the-art methods

Crossref

Lancaster E-Prints

University of East Anglia digital repository

DGPose: Deep Generative Models for Human Body Analysis

Author: Ajanthan Thalaiyasingam
Boukhayma Adnane
de Bem Rodrigo
Ghosh Arnab
Miksik Ondrej
Siddharth N.
Torr Philip
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks.Comment: IJCV 2020 special issue on 'Generating Realistic Visual Data of Human Behavior' preprint. Keywords: deep generative models, semi-supervised learning, human pose estimation, variational autoencoders, generative adversarial network

arXiv.org e-Print Archive

Edinburgh Research Explorer

Oxford University Research Archive

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

Author: Audhkhasi Kartik
Kurata Gakuto
Publication venue
Publication date: 02/07/2019
Field of study

Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7%/23.1% from 14.9%/24.1% on the Hub5 2000 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.Comment: Accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline.

Author: Beckett Laurel
Chuang Kangway V
DeCarli Charles
Dugger Brittany N
Jin Lee-Way
Keiser Michael J
Tang Ziqi
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

Neuropathologists assess vast brain areas to identify diverse and subtly-differentiated morphologies. Standard semi-quantitative scoring approaches, however, are coarse-grained and lack precise neuroanatomic localization. We report a proof-of-concept deep learning pipeline that identifies specific neuropathologies-amyloid plaques and cerebral amyloid angiopathy-in immunohistochemically-stained archival slides. Using automated segmentation of stained objects and a cloud-based interface, we annotate > 70,000 plaque candidates from 43 whole slide images (WSIs) to train and evaluate convolutional neural networks. Networks achieve strong plaque classification on a 10-WSI hold-out set (0.993 and 0.743 areas under the receiver operating characteristic and precision recall curve, respectively). Prediction confidence maps visualize morphology distributions at high resolution. Resulting network-derived amyloid beta (Aβ)-burden scores correlate well with established semi-quantitative scores on a 30-WSI blinded hold-out. Finally, saliency mapping demonstrates that networks learn patterns agreeing with accepted pathologic features. This scalable means to augment a neuropathologist's ability suggests a route to neuropathologic deep phenotyping

eScholarship - University of California