20,276 research outputs found
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
Multi-View Kernels for Low-Dimensional Modeling of Seismic Events
The problem of learning from seismic recordings has been studied for years.
There is a growing interest in developing automatic mechanisms for identifying
the properties of a seismic event. One main motivation is the ability have a
reliable identification of man-made explosions. The availability of multiple
high-dimensional observations has increased the use of machine learning
techniques in a variety of fields. In this work, we propose to use a
kernel-fusion based dimensionality reduction framework for generating
meaningful seismic representations from raw data. The proposed method is tested
on 2023 events that were recorded in Israel and in Jordan. The method achieves
promising results in classification of event type as well as in estimating the
location of the event. The proposed fusion and dimensionality reduction tools
may be applied to other types of geophysical data
Geometry-Contrastive GAN for Facial Expression Transfer
In this paper, we propose a Geometry-Contrastive Generative Adversarial
Network (GC-GAN) for transferring continuous emotions across different
subjects. Given an input face with certain emotion and a target facial
expression from another subject, GC-GAN can generate an identity-preserving
face with the target expression. Geometry information is introduced into cGANs
as continuous conditions to guide the generation of facial expressions. In
order to handle the misalignment across different subjects or emotions,
contrastive learning is used to transform geometry manifold into an embedded
semantic manifold of facial expressions. Therefore, the embedded geometry is
injected into the latent space of GANs and control the emotion generation
effectively. Experimental results demonstrate that our proposed method can be
applied in facial expression transfer even there exist big differences in
facial shapes and expressions between different subjects
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis
Photorealistic frontal view synthesis from a single face image has a wide
range of applications in the field of face recognition. Although data-driven
deep learning methods have been proposed to address this problem by seeking
solutions from ample face data, this problem is still challenging because it is
intrinsically ill-posed. This paper proposes a Two-Pathway Generative
Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by
simultaneously perceiving global structures and local details. Four landmark
located patch networks are proposed to attend to local textures in addition to
the commonly used global encoder-decoder network. Except for the novel
architecture, we make this ill-posed problem well constrained by introducing a
combination of adversarial loss, symmetry loss and identity preserving loss.
The combined loss function leverages both frontal face distribution and
pre-trained discriminative deep face models to guide an identity preserving
inference of frontal views from profiles. Different from previous deep learning
methods that mainly rely on intermediate features for recognition, our method
directly leverages the synthesized identity preserving image for downstream
tasks like face recognition and attribution estimation. Experimental results
demonstrate that our method not only presents compelling perceptual results but
also outperforms state-of-the-art results on large pose face recognition.Comment: accepted at ICCV 2017, main paper & supplementary material, 11 page
Embedding Text in Hyperbolic Spaces
Natural language text exhibits hierarchical structure in a variety of
respects. Ideally, we could incorporate our prior knowledge of this
hierarchical structure into unsupervised learning algorithms that work on text
data. Recent work by Nickel & Kiela (2017) proposed using hyperbolic instead of
Euclidean embedding spaces to represent hierarchical data and demonstrated
encouraging results when embedding graphs. In this work, we extend their method
with a re-parameterization technique that allows us to learn hyperbolic
embeddings of arbitrarily parameterized objects. We apply this framework to
learn word and sentence embeddings in hyperbolic space in an unsupervised
manner from text corpora. The resulting embeddings seem to encode certain
intuitive notions of hierarchy, such as word-context frequency and phrase
constituency. However, the implicit continuous hierarchy in the learned
hyperbolic space makes interrogating the model's learned hierarchies more
difficult than for models that learn explicit edges between items. The learned
hyperbolic embeddings show improvements over Euclidean embeddings in some --
but not all -- downstream tasks, suggesting that hierarchical organization is
more useful for some tasks than others.Comment: TextGraphs 201
Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference
Monocular depth estimation is a challenging task in complex compositions
depicting multiple objects of diverse scales. Albeit the recent great progress
thanks to the deep convolutional neural networks (CNNs), the state-of-the-art
monocular depth estimation methods still fall short to handle such real-world
challenging scenarios. In this paper, we propose a deep end-to-end learning
framework to tackle these challenges, which learns the direct mapping from a
color image to the corresponding depth map. First, we represent monocular depth
estimation as a multi-category dense labeling task by contrast to the
regression based formulation. In this way, we could build upon the recent
progress in dense labeling such as semantic segmentation. Second, we fuse
different side-outputs from our front-end dilated convolutional neural network
in a hierarchical way to exploit the multi-scale depth cues for depth
estimation, which is critical to achieve scale-aware depth estimation. Third,
we propose to utilize soft-weighted-sum inference instead of the hard-max
inference, transforming the discretized depth score to continuous depth value.
Thus, we reduce the influence of quantization error and improve the robustness
of our method. Extensive experiments on the NYU Depth V2 and KITTI datasets
show the superiority of our method compared with current state-of-the-art
methods. Furthermore, experiments on the NYU V2 dataset reveal that our model
is able to learn the probability distribution of depth
CariGANs: Unpaired Photo-to-Caricature Translation
Facial caricature is an art form of drawing faces in an exaggerated way to
convey humor or sarcasm. In this paper, we propose the first Generative
Adversarial Network (GAN) for unpaired photo-to-caricature translation, which
we call "CariGANs". It explicitly models geometric exaggeration and appearance
stylization using two components: CariGeoGAN, which only models the
geometry-to-geometry transformation from face photos to caricatures, and
CariStyGAN, which transfers the style appearance from caricatures to face
photos without any geometry deformation. In this way, a difficult cross-domain
translation problem is decoupled into two easier tasks. The perceptual study
shows that caricatures generated by our CariGANs are closer to the hand-drawn
ones, and at the same time better persevere the identity, compared to
state-of-the-art methods. Moreover, our CariGANs allow users to control the
shape exaggeration degree and change the color/texture style by tuning the
parameters or giving an example caricature.Comment: To appear at SIGGRAPH Asia 201
Face Recognition: From Traditional to Deep Learning Methods
Starting in the seventies, face recognition has become one of the most
researched topics in computer vision and biometrics. Traditional methods based
on hand-crafted features and traditional machine learning techniques have
recently been superseded by deep neural networks trained with very large
datasets. In this paper we provide a comprehensive and up-to-date literature
review of popular face recognition methods including both traditional
(geometry-based, holistic, feature-based and hybrid methods) and deep learning
methods
Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval
With the rapid growth of web images, hashing has received increasing
interests in large scale image retrieval. Research efforts have been devoted to
learning compact binary codes that preserve semantic similarity based on
labels. However, most of these hashing methods are designed to handle simple
binary similarity. The complex multilevel semantic structure of images
associated with multiple labels have not yet been well explored. Here we
propose a deep semantic ranking based method for learning hash functions that
preserve multilevel semantic similarity between multi-label images. In our
approach, deep convolutional neural network is incorporated into hash functions
to jointly learn feature representations and mappings from them to hash codes,
which avoids the limitation of semantic representation power of hand-crafted
features. Meanwhile, a ranking list that encodes the multilevel similarity
information is employed to guide the learning of such deep hash functions. An
effective scheme based on surrogate loss is used to solve the intractable
optimization problem of nonsmooth and multivariate ranking measures involved in
the learning procedure. Experimental results show the superiority of our
proposed approach over several state-of-the-art hashing methods in term of
ranking evaluation metrics when tested on multi-label image datasets.Comment: CVPR 201
Intrinsic Isometric Manifold Learning with Application to Localization
Data living on manifolds commonly appear in many applications. Often this
results from an inherently latent low-dimensional system being observed through
higher dimensional measurements. We show that under certain conditions, it is
possible to construct an intrinsic and isometric data representation, which
respects an underlying latent intrinsic geometry. Namely, we view the observed
data only as a proxy and learn the structure of a latent unobserved intrinsic
manifold, whereas common practice is to learn the manifold of the observed
data. For this purpose, we build a new metric and propose a method for its
robust estimation by assuming mild statistical priors and by using artificial
neural networks as a mechanism for metric regularization and parametrization.
We show successful application to unsupervised indoor localization in ad-hoc
sensor networks. Specifically, we show that our proposed method facilitates
accurate localization of a moving agent from imaging data it collects.
Importantly, our method is applied in the same way to two different imaging
modalities, thereby demonstrating its intrinsic and modality-invariant
capabilities
- …