20 research outputs found

    Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition

    Full text link
    Here, we develop an audiovisual deep residual network for multimodal apparent personality trait recognition. The network is trained end-to-end for predicting the Big Five personality traits of people from their videos. That is, the network does not require any feature engineering or visual analysis such as face detection, face landmark alignment or facial expression recognition. Recently, the network won the third place in the ChaLearn First Impressions Challenge with a test accuracy of 0.9109

    TextureGAN: Controlling Deep Image Synthesis with Texture Patches

    Full text link
    In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh

    Sketch Plus Colorization Deep Convolutional Neural Networks for Photos Generation from Sketches

    Get PDF
    In this paper, we introduce a method to generate photos from sketches using Deep Convolutional Neural Networks (DCNN). This research proposes a method by combining a network to invert sketches into photos (sketch inversion net) with a network to predict color given grayscale images (colorization net). By using this method, the quality of generated photos is expected to be more similar to the actual photos. We first artificially constructed uncontrolled conditions for the dataset. The dataset, which consists of hand-drawn sketches and their corresponding photos, were pre-processed using several data augmentation techniques to train the models in addressing the issues of rotation, scaling, shape, noise, and positioning. Validation was measured using two types of similarity measurements: pixel- difference based and human visual system (HVS) which mimics human perception in evaluating the quality of an image. The pixel- difference based metric consists of Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) while the HVS consists of Universal Image Quality Index (UIQI) and Structural Similarity (SSIM). Our method gives the best quality of generated photos for all measures (844.04 for MSE, 19.06 for PSNR, 0.47 for UIQI, and 0.66 for SSIM)

    An improved Siamese network for face sketch recognition

    Get PDF
    Face sketch recognition identifies the face photo from a large face sketch dataset. Some traditional methods are typically used to reduce the modality gap between face photos and sketches and gain excellent recognition rate based on a pseudo image which is synthesized using the corresponded face photo. However, these methods cannot obtain better high recognition rate for all face sketch datasets, because the use of extracted features cannot lead to the elimination of the effect of different modalities' images. The feature representation of the deep convolutional neural networks as a feasible approach for identification involves wider applications than other methods. It is adapted to extract the features which eliminate the difference between face photos and sketches. The recognition rate is high for neural networks constructed by learning optimal local features, even if the input image shows geometric distortions. However, the case of overfitting leads to the unsatisfactory performance of deep learning methods on face sketch recognition tasks. Also, the sketch images are too simple to be used for extracting effective features. This paper aims to increase the matching rate using the Siamese convolution network architecture. The framework is used to extract useful features from each image pair to reduce the modality gap. Moreover, data augmentation is used to avoid overfitting. We explore the performance of three loss functions and compare the similarity between each image pair. The experimental results show that our framework is adequate for a composite sketch dataset. In addition, it reduces the influence of overfitting by using data augmentation and modifying the network structure

    Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm

    Full text link
    Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation. We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context, and optionally guided by user-specified pitch and rhythm snippets. First, we introduce SketchVAE, a novel variational autoencoder that explicitly factorizes rhythm and pitch contour to form the basis of our proposed model. Then we introduce two discriminative architectures, SketchInpainter and SketchConnector, that in conjunction perform the guided music completion, filling in representations for the missing measures conditioned on surrounding context and user-specified snippets. We evaluate SketchNet on a standard dataset of Irish folk music and compare with models from recent works. When used for music completion, our approach outperforms the state-of-the-art both in terms of objective metrics and subjective listening tests. Finally, we demonstrate that our model can successfully incorporate user-specified snippets during the generation process.Comment: 8 pages, 8 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 202

    Synthesizing Images of Humans in Unseen Poses

    Full text link
    We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions.Comment: CVPR 201
    corecore