17,721 research outputs found
Towards Fine-grained Human Pose Transfer with Detail Replenishing Network
Human pose transfer (HPT) is an emerging research topic with huge potential
in fashion design, media production, online advertising and virtual reality.
For these applications, the visual realism of fine-grained appearance details
is crucial for production quality and user engagement. However, existing HPT
methods often suffer from three fundamental issues: detail deficiency, content
ambiguity and style inconsistency, which severely degrade the visual quality
and realism of generated images. Aiming towards real-world applications, we
develop a more challenging yet practical HPT setting, termed as Fine-grained
Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail
replenishment. Concretely, we analyze the potential design flaws of existing
methods via an illustrative example, and establish the core FHPT methodology by
combing the idea of content synthesis and feature transfer together in a
mutually-guided fashion. Thereafter, we substantiate the proposed methodology
with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine
model training scheme. Moreover, we build up a complete suite of fine-grained
evaluation protocols to address the challenges of FHPT in a comprehensive
manner, including semantic analysis, structural detection and perceptual
quality assessment. Extensive experiments on the DeepFashion benchmark dataset
have verified the power of proposed benchmark against start-of-the-art works,
with 12\%-14\% gain on top-10 retrieval recall, 5\% higher joint localization
accuracy, and near 40\% gain on face identity preservation. Moreover, the
evaluation results offer further insights to the subject matter, which could
inspire many promising future works along this direction.Comment: IEEE TIP submissio
A Hybrid Model for Identity Obfuscation by Face Replacement
As more and more personal photos are shared and tagged in social media,
avoiding privacy risks such as unintended recognition becomes increasingly
challenging. We propose a new hybrid approach to obfuscate identities in photos
by head replacement. Our approach combines state of the art parametric face
synthesis with latest advances in Generative Adversarial Networks (GAN) for
data-driven image synthesis. On the one hand, the parametric part of our method
gives us control over the facial parameters and allows for explicit
manipulation of the identity. On the other hand, the data-driven aspects allow
for adding fine details and overall realism as well as seamless blending into
the scene context. In our experiments, we show highly realistic output of our
system that improves over the previous state of the art in obfuscation rate
while preserving a higher similarity to the original image content.Comment: ECCV'18, camera-ready versio
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
An Adaptive Fuzzy-Based System to Simulate, Quantify and Compensate Color Blindness
About 8% of the male population of the world are affected by a determined
type of color vision disturbance, which varies from the partial to complete
reduction of the ability to distinguish certain colors. A considerable amount
of color blind people are able to live all life long without knowing they have
color vision disabilities and abnormalities. Nowadays the evolution of
information technology and computer science, specifically image processing
techniques and computer graphics, can be fundamental to aid at the development
of adaptive color blindness correction tools. This paper presents a software
tool based on Fuzzy Logic to evaluate the type and the degree of color
blindness a person suffer from. In order to model several degrees of color
blindness, herein this work we modified the classical linear transform-based
simulation method by the use of fuzzy parameters. We also proposed four new
methods to correct color blindness based on a fuzzy approach: Methods A and B,
with and without histogram equalization. All the methods are based on
combinations of linear transforms and histogram operations. In order to
evaluate the results we implemented a web-based survey to get the best results
according to optimize to distinguish different elements in an image. Results
obtained from 40 volunteers proved that the Method B with histogram
equalization got the best results for about 47% of volunteers
Using Contour Trees in the Analysis and Visualization of Radio Astronomy Data Cubes
The current generation of radio and millimeter telescopes, particularly the
Atacama Large Millimeter Array (ALMA), offers enormous advances in observing
capabilities. While these advances represent an unprecedented opportunity to
facilitate scientific understanding, the increased complexity in the spatial
and spectral structure of these ALMA data cubes lead to challenges in their
interpretation. In this paper, we perform a feasibility study for applying
topological data analysis and visualization techniques never before tested by
the ALMA community. Through techniques based on contour trees, we seek to
improve upon existing analysis and visualization workflows of ALMA data cubes,
in terms of accuracy and speed in feature extraction. We review our application
development process in building effective analysis and visualization
capabilities for the astrophysicists. We also summarize effective design
practices by identifying domain-specific needs of simplicity, integrability,
and reproducibility, in order to best target and service the large astrophysics
community
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
A statistical multiresolution approach for face recognition using structural hidden Markov models
This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy
Hybrid Distortion Aggregated Visual Comfort Assessment for Stereoscopic Image Retargeting
Visual comfort is a quite important factor in 3D media service. Few research
efforts have been carried out in this area especially in case of 3D content
retargeting which may introduce more complicated visual distortions. In this
paper, we propose a Hybrid Distortion Aggregated Visual Comfort Assessment
(HDA-VCA) scheme for stereoscopic retargeted images (SRI), considering
aggregation of hybrid distortions including structure distortion, information
loss, binocular incongruity and semantic distortion. Specifically, a Local-SSIM
feature is proposed to reflect the local structural distortion of SRI, and
information loss is represented by Dual Natural Scene Statistics (D-NSS)
feature extracted from the binocular summation and difference channels.
Regarding binocular incongruity, visual comfort zone, window violation,
binocular rivalry, and accommodation-vergence conflict of human visual system
(HVS) are evaluated. Finally, the semantic distortion is represented by the
correlation distance of paired feature maps extracted from original
stereoscopic image and its retargeted image by using trained deep neural
network. We validate the effectiveness of HDA-VCA on published Stereoscopic
Image Retargeting Database (SIRD) and two stereoscopic image databases IEEE-SA
and NBU 3D-VCA. The results demonstrate HDA-VCA's superior performance in
handling hybrid distortions compared to state-of-the-art VCA schemes.Comment: 13 pages, 11 figures, 4 table
Superimposition-guided Facial Reconstruction from Skull
We develop a new algorithm to perform facial reconstruction from a given
skull. This technique has forensic application in helping the identification of
skeletal remains when other information is unavailable. Unlike most existing
strategies that directly reconstruct the face from the skull, we utilize a
database of portrait photos to create many face candidates, then perform a
superimposition to get a well matched face, and then revise it according to the
superimposition. To support this pipeline, we build an effective autoencoder
for image-based facial reconstruction, and a generative model for constrained
face inpainting. Our experiments have demonstrated that the proposed pipeline
is stable and accurate.Comment: 14 pages; 14 figure
Understanding Image Virality
Virality of online content on social networking websites is an important but
esoteric phenomenon often studied in fields like marketing, psychology and data
mining. In this paper we study viral images from a computer vision perspective.
We introduce three new image datasets from Reddit, and define a virality score
using Reddit metadata. We train classifiers with state-of-the-art image
features to predict virality of individual images, relative virality in pairs
of images, and the dominant topic of a viral image. We also compare machine
performance to human performance on these tasks. We find that computers perform
poorly with low level features, and high level information is critical for
predicting virality. We encode semantic information through relative
attributes. We identify the 5 key visual attributes that correlate with
virality. We create an attribute-based characterization of images that can
predict relative virality with 68.10% accuracy (SVM+Deep Relative Attributes)
-- better than humans at 60.12%. Finally, we study how human prediction of
image virality varies with different `contexts' in which the images are viewed,
such as the influence of neighbouring images, images recently viewed, as well
as the image title or caption. This work is a first step in understanding the
complex but important phenomenon of image virality. Our datasets and
annotations will be made publicly available.Comment: Pre-print, IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 201
- …