Search CORE

20 research outputs found

Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition

Author: A Todorov
A Vinciarelli
A Vinciarelli
AG Wright
Arulkumar Subramaniam
B Schuller
CY Olivola
F Mairesse
GL Lorenzo
J Schmidhuber
J Willis
JI Biel
Kaiming He
L Teijeiro-Mosquera
LP Naumann
N Srivastava
P Borkenau
RJW Vernon
S Hochreiter
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2016
Field of study

Here, we develop an audiovisual deep residual network for multimodal apparent personality trait recognition. The network is trained end-to-end for predicting the Big Five personality traits of people from their videos. That is, the network does not require any feature engineering or visual analysis such as face detection, face landmark alignment or facial expression recognition. Recently, the network won the third place in the ChaLearn First Impressions Challenge with a test accuracy of 0.9109

arXiv.org e-Print Archive

Crossref

TextureGAN: Controlling Deep Image Synthesis with Texture Patches

Author: Agrawal Varun
Fang Chen
Hays James
Lu Jingwan
Raj Amit
Sangkloy Patsorn
Xian Wenqi
Yu Fisher
Publication venue
Publication date: 14/04/2018
Field of study

In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh

arXiv.org e-Print Archive

Crossref

Sketch Plus Colorization Deep Convolutional Neural Networks for Photos Generation from Sketches

Author: Fanany Mohamad Ivan
Putri Vinnia Kemala
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/11/2017
Field of study

In this paper, we introduce a method to generate photos from sketches using Deep Convolutional Neural Networks (DCNN). This research proposes a method by combining a network to invert sketches into photos (sketch inversion net) with a network to predict color given grayscale images (colorization net). By using this method, the quality of generated photos is expected to be more similar to the actual photos. We first artificially constructed uncontrolled conditions for the dataset. The dataset, which consists of hand-drawn sketches and their corresponding photos, were pre-processed using several data augmentation techniques to train the models in addressing the issues of rotation, scaling, shape, noise, and positioning. Validation was measured using two types of similarity measurements: pixel- difference based and human visual system (HVS) which mimics human perception in evaluating the quality of an image. The pixel- difference based metric consists of Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) while the HVS consists of Universal Image Quality Index (UIQI) and Structural Similarity (SSIM). Our method gives the best quality of generated photos for all measures (844.04 for MSE, 19.06 for PSNR, 0.47 for UIQI, and 0.66 for SSIM)

Proceeding of the Electrical Engineering Computer Science and Informatics

An improved Siamese network for face sketch recognition

Author: Fan Liang
Hou Yuxuan
Liu Han
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Face sketch recognition identifies the face photo from a large face sketch dataset. Some traditional methods are typically used to reduce the modality gap between face photos and sketches and gain excellent recognition rate based on a pseudo image which is synthesized using the corresponded face photo. However, these methods cannot obtain better high recognition rate for all face sketch datasets, because the use of extracted features cannot lead to the elimination of the effect of different modalities' images. The feature representation of the deep convolutional neural networks as a feasible approach for identification involves wider applications than other methods. It is adapted to extract the features which eliminate the difference between face photos and sketches. The recognition rate is high for neural networks constructed by learning optimal local features, even if the input image shows geometric distortions. However, the case of overfitting leads to the unsatisfactory performance of deep learning methods on face sketch recognition tasks. Also, the sketch images are too simple to be used for extracting effective features. This paper aims to increase the matching rate using the Siamese convolution network architecture. The framework is used to extract useful features from each image pair to reduce the modality gap. Moreover, data augmentation is used to avoid overfitting. We explore the performance of three loss functions and compare the similarity between each image pair. The experimental results show that our framework is adequate for a composite sketch dataset. In addition, it reduces the influence of overfitting by using data augmentation and modifying the network structure

Crossref

Online Research @ Cardiff

Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm

Author: Berg-Kirkpatrick Taylor
Chen Ke
Dubnov Shlomo
Wang Cheng-i
Publication venue
Publication date: 01/01/2020
Field of study

Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation. We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context, and optionally guided by user-specified pitch and rhythm snippets. First, we introduce SketchVAE, a novel variational autoencoder that explicitly factorizes rhythm and pitch contour to form the basis of our proposed model. Then we introduce two discriminative architectures, SketchInpainter and SketchConnector, that in conjunction perform the guided music completion, filling in representations for the missing measures conditioned on surrounding context and user-specified snippets. We evaluate SketchNet on a standard dataset of Irish folk music and compare with models from recent works. When used for music completion, our approach outperforms the state-of-the-art both in terms of objective metrics and subjective listening tests. Finally, we demonstrate that our model can successfully incorporate user-specified snippets during the generation process.Comment: 8 pages, 8 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 202

arXiv.org e-Print Archive

eScholarship - University of California

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Synthesizing Images of Humans in Unseen Poses

Author: Balakrishnan Guha
Dalca Adrian V.
Durand Fredo
Guttag John
Zhao Amy
Publication venue
Publication date: 20/04/2018
Field of study

We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT