20 research outputs found
Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition
Here, we develop an audiovisual deep residual network for multimodal apparent
personality trait recognition. The network is trained end-to-end for predicting
the Big Five personality traits of people from their videos. That is, the
network does not require any feature engineering or visual analysis such as
face detection, face landmark alignment or facial expression recognition.
Recently, the network won the third place in the ChaLearn First Impressions
Challenge with a test accuracy of 0.9109
TextureGAN: Controlling Deep Image Synthesis with Texture Patches
In this paper, we investigate deep image synthesis guided by sketch, color,
and texture. Previous image synthesis methods can be controlled by sketch and
color strokes but we are the first to examine texture control. We allow a user
to place a texture patch on a sketch at arbitrary locations and scales to
control the desired output texture. Our generative network learns to synthesize
objects consistent with these texture suggestions. To achieve this, we develop
a local texture loss in addition to adversarial and content loss to train the
generative network. We conduct experiments using sketches generated from real
images and textures sampled from a separate texture database and results show
that our proposed algorithm is able to generate plausible images that are
faithful to user controls. Ablation studies show that our proposed pipeline can
generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh
Sketch Plus Colorization Deep Convolutional Neural Networks for Photos Generation from Sketches
In this paper, we introduce a method to generate photos from sketches using Deep Convolutional Neural Networks (DCNN). This research proposes a method by combining a network to invert sketches into photos (sketch inversion net) with a network to predict color given grayscale images (colorization net). By using this method, the quality of generated photos is expected to be more similar to the actual photos. We first artificially constructed uncontrolled conditions for the dataset. The dataset, which consists of hand-drawn sketches and their corresponding photos, were pre-processed using several data augmentation techniques to train the models in addressing the issues of rotation, scaling, shape, noise, and positioning. Validation was measured using two types of similarity measurements: pixel- difference based and human visual system (HVS) which mimics human perception in evaluating the quality of an image. The pixel- difference based metric consists of Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) while the HVS consists of Universal Image Quality Index (UIQI) and Structural Similarity (SSIM). Our method gives the best quality of generated photos for all measures (844.04 for MSE, 19.06 for PSNR, 0.47 for UIQI, and 0.66 for SSIM)
An improved Siamese network for face sketch recognition
Face sketch recognition identifies the face photo from a large face sketch dataset. Some traditional methods are typically used to reduce the modality gap between face photos and sketches and gain excellent recognition rate based on a pseudo image which is synthesized using the corresponded face photo. However, these methods cannot obtain better high recognition rate for all face sketch datasets, because the use of extracted features cannot lead to the elimination of the effect of different modalities' images. The feature representation of the deep convolutional neural networks as a feasible approach for identification involves wider applications than other methods. It is adapted to extract the features which eliminate the difference between face photos and sketches. The recognition rate is high for neural networks constructed by learning optimal local features, even if the input image shows geometric distortions. However, the case of overfitting leads to the unsatisfactory performance of deep learning methods on face sketch recognition tasks. Also, the sketch images are too simple to be used for extracting effective features. This paper aims to increase the matching rate using the Siamese convolution network architecture. The framework is used to extract useful features from each image pair to reduce the modality gap. Moreover, data augmentation is used to avoid overfitting. We explore the performance of three loss functions and compare the similarity between each image pair. The experimental results show that our framework is adequate for a composite sketch dataset. In addition, it reduces the influence of overfitting by using data augmentation and modifying the network structure
Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm
Drawing an analogy with automatic image completion systems, we propose Music
SketchNet, a neural network framework that allows users to specify partial
musical ideas guiding automatic music generation. We focus on generating the
missing measures in incomplete monophonic musical pieces, conditioned on
surrounding context, and optionally guided by user-specified pitch and rhythm
snippets. First, we introduce SketchVAE, a novel variational autoencoder that
explicitly factorizes rhythm and pitch contour to form the basis of our
proposed model. Then we introduce two discriminative architectures,
SketchInpainter and SketchConnector, that in conjunction perform the guided
music completion, filling in representations for the missing measures
conditioned on surrounding context and user-specified snippets. We evaluate
SketchNet on a standard dataset of Irish folk music and compare with models
from recent works. When used for music completion, our approach outperforms the
state-of-the-art both in terms of objective metrics and subjective listening
tests. Finally, we demonstrate that our model can successfully incorporate
user-specified snippets during the generation process.Comment: 8 pages, 8 figures, Proceedings of the 21st International Society for
Music Information Retrieval Conference, ISMIR 202
Synthesizing Images of Humans in Unseen Poses
We address the computational problem of novel human pose synthesis. Given an
image of a person and a desired pose, we produce a depiction of that person in
that pose, retaining the appearance of both the person and background. We
present a modular generative neural network that synthesizes unseen poses using
training pairs of images and poses taken from human action videos. Our network
separates a scene into different body part and background layers, moves body
parts to new locations and refines their appearances, and composites the new
foreground with a hole-filled background. These subtasks, implemented with
separate modules, are trained jointly using only a single target image as a
supervised label. We use an adversarial discriminator to force our network to
synthesize realistic details conditioned on pose. We demonstrate image
synthesis results on three action classes: golf, yoga/workouts and tennis, and
show that our method produces accurate results within action classes as well as
across action classes. Given a sequence of desired poses, we also produce
coherent videos of actions.Comment: CVPR 201