17,721 research outputs found

    Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

    Full text link
    Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization accuracy, and near 40\% gain on face identity preservation. Moreover, the evaluation results offer further insights to the subject matter, which could inspire many promising future works along this direction.Comment: IEEE TIP submissio

    A Hybrid Model for Identity Obfuscation by Face Replacement

    Full text link
    As more and more personal photos are shared and tagged in social media, avoiding privacy risks such as unintended recognition becomes increasingly challenging. We propose a new hybrid approach to obfuscate identities in photos by head replacement. Our approach combines state of the art parametric face synthesis with latest advances in Generative Adversarial Networks (GAN) for data-driven image synthesis. On the one hand, the parametric part of our method gives us control over the facial parameters and allows for explicit manipulation of the identity. On the other hand, the data-driven aspects allow for adding fine details and overall realism as well as seamless blending into the scene context. In our experiments, we show highly realistic output of our system that improves over the previous state of the art in obfuscation rate while preserving a higher similarity to the original image content.Comment: ECCV'18, camera-ready versio

    Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition

    Full text link
    Generative Adversarial Networks (GAN) have attracted much research attention recently, leading to impressive results for natural image generation. However, to date little success was observed in using GAN generated images for improving classification tasks. Here we attempt to explore, in the context of car license plate recognition, whether it is possible to generate synthetic training data using GAN to improve recognition accuracy. With a carefully-designed pipeline, we show that the answer is affirmative. First, a large-scale image set is generated using the generator of GAN, without manual annotation. Then, these images are fed to a deep convolutional neural network (DCNN) followed by a bidirectional recurrent neural network (BRNN) with long short-term memory (LSTM), which performs the feature learning and sequence labelling. Finally, the pre-trained model is fine-tuned on real images. Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7.5% over a strong baseline with moderate-sized real data being available. We show that the proposed framework achieves competitive recognition accuracy on challenging test datasets. We also leverage the depthwise separate convolution to construct a lightweight convolutional RNN, which is about half size and 2x faster on CPU. Combining this framework and the proposed pipeline, we make progress in performing accurate recognition on mobile and embedded devices

    An Adaptive Fuzzy-Based System to Simulate, Quantify and Compensate Color Blindness

    Full text link
    About 8% of the male population of the world are affected by a determined type of color vision disturbance, which varies from the partial to complete reduction of the ability to distinguish certain colors. A considerable amount of color blind people are able to live all life long without knowing they have color vision disabilities and abnormalities. Nowadays the evolution of information technology and computer science, specifically image processing techniques and computer graphics, can be fundamental to aid at the development of adaptive color blindness correction tools. This paper presents a software tool based on Fuzzy Logic to evaluate the type and the degree of color blindness a person suffer from. In order to model several degrees of color blindness, herein this work we modified the classical linear transform-based simulation method by the use of fuzzy parameters. We also proposed four new methods to correct color blindness based on a fuzzy approach: Methods A and B, with and without histogram equalization. All the methods are based on combinations of linear transforms and histogram operations. In order to evaluate the results we implemented a web-based survey to get the best results according to optimize to distinguish different elements in an image. Results obtained from 40 volunteers proved that the Method B with histogram equalization got the best results for about 47% of volunteers

    Using Contour Trees in the Analysis and Visualization of Radio Astronomy Data Cubes

    Full text link
    The current generation of radio and millimeter telescopes, particularly the Atacama Large Millimeter Array (ALMA), offers enormous advances in observing capabilities. While these advances represent an unprecedented opportunity to facilitate scientific understanding, the increased complexity in the spatial and spectral structure of these ALMA data cubes lead to challenges in their interpretation. In this paper, we perform a feasibility study for applying topological data analysis and visualization techniques never before tested by the ALMA community. Through techniques based on contour trees, we seek to improve upon existing analysis and visualization workflows of ALMA data cubes, in terms of accuracy and speed in feature extraction. We review our application development process in building effective analysis and visualization capabilities for the astrophysicists. We also summarize effective design practices by identifying domain-specific needs of simplicity, integrability, and reproducibility, in order to best target and service the large astrophysics community

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    A statistical multiresolution approach for face recognition using structural hidden Markov models

    Get PDF
    This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy

    Hybrid Distortion Aggregated Visual Comfort Assessment for Stereoscopic Image Retargeting

    Full text link
    Visual comfort is a quite important factor in 3D media service. Few research efforts have been carried out in this area especially in case of 3D content retargeting which may introduce more complicated visual distortions. In this paper, we propose a Hybrid Distortion Aggregated Visual Comfort Assessment (HDA-VCA) scheme for stereoscopic retargeted images (SRI), considering aggregation of hybrid distortions including structure distortion, information loss, binocular incongruity and semantic distortion. Specifically, a Local-SSIM feature is proposed to reflect the local structural distortion of SRI, and information loss is represented by Dual Natural Scene Statistics (D-NSS) feature extracted from the binocular summation and difference channels. Regarding binocular incongruity, visual comfort zone, window violation, binocular rivalry, and accommodation-vergence conflict of human visual system (HVS) are evaluated. Finally, the semantic distortion is represented by the correlation distance of paired feature maps extracted from original stereoscopic image and its retargeted image by using trained deep neural network. We validate the effectiveness of HDA-VCA on published Stereoscopic Image Retargeting Database (SIRD) and two stereoscopic image databases IEEE-SA and NBU 3D-VCA. The results demonstrate HDA-VCA's superior performance in handling hybrid distortions compared to state-of-the-art VCA schemes.Comment: 13 pages, 11 figures, 4 table

    Superimposition-guided Facial Reconstruction from Skull

    Full text link
    We develop a new algorithm to perform facial reconstruction from a given skull. This technique has forensic application in helping the identification of skeletal remains when other information is unavailable. Unlike most existing strategies that directly reconstruct the face from the skull, we utilize a database of portrait photos to create many face candidates, then perform a superimposition to get a well matched face, and then revise it according to the superimposition. To support this pipeline, we build an effective autoencoder for image-based facial reconstruction, and a generative model for constrained face inpainting. Our experiments have demonstrated that the proposed pipeline is stable and accurate.Comment: 14 pages; 14 figure

    Understanding Image Virality

    Full text link
    Virality of online content on social networking websites is an important but esoteric phenomenon often studied in fields like marketing, psychology and data mining. In this paper we study viral images from a computer vision perspective. We introduce three new image datasets from Reddit, and define a virality score using Reddit metadata. We train classifiers with state-of-the-art image features to predict virality of individual images, relative virality in pairs of images, and the dominant topic of a viral image. We also compare machine performance to human performance on these tasks. We find that computers perform poorly with low level features, and high level information is critical for predicting virality. We encode semantic information through relative attributes. We identify the 5 key visual attributes that correlate with virality. We create an attribute-based characterization of images that can predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes) -- better than humans at 60.12%. Finally, we study how human prediction of image virality varies with different `contexts' in which the images are viewed, such as the influence of neighbouring images, images recently viewed, as well as the image title or caption. This work is a first step in understanding the complex but important phenomenon of image virality. Our datasets and annotations will be made publicly available.Comment: Pre-print, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201
    corecore