11,604 research outputs found
Geometry-aware Deep Transform
Many recent efforts have been devoted to designing sophisticated deep
learning structures, obtaining revolutionary results on benchmark datasets. The
success of these deep learning methods mostly relies on an enormous volume of
labeled training samples to learn a huge number of parameters in a network;
therefore, understanding the generalization ability of a learned deep network
cannot be overlooked, especially when restricted to a small training set, which
is the case for many applications. In this paper, we propose a novel deep
learning objective formulation that unifies both the classification and metric
learning criteria. We then introduce a geometry-aware deep transform to enable
a non-linear discriminative and robust feature transform, which shows
competitive performance on small training sets for both synthetic and
real-world data. We further support the proposed framework with a formal
-robustness analysis.Comment: to appear in ICCV2015, updated with minor revisio
Riemannian joint dimensionality reduction and dictionary learning on symmetric positive definite manifold
Dictionary leaning (DL) and dimensionality reduction (DR) are powerful tools
to analyze high-dimensional noisy signals. This paper presents a proposal of a
novel Riemannian joint dimensionality reduction and dictionary learning
(R-JDRDL) on symmetric positive definite (SPD) manifolds for classification
tasks. The joint learning considers the interaction between dimensionality
reduction and dictionary learning procedures by connecting them into a unified
framework. We exploit a Riemannian optimization framework for solving DL and DR
problems jointly. Finally, we demonstrate that the proposed R-JDRDL outperforms
existing state-of-the-arts algorithms when used for image classification tasks.Comment: European Signal Processing Conference (EUSIPCO 2018
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods
Representing images and videos with Symmetric Positive Definite (SPD)
matrices, and considering the Riemannian geometry of the resulting space, has
been shown to yield high discriminative power in many visual recognition tasks.
Unfortunately, computation on the Riemannian manifold of SPD matrices
-especially of high-dimensional ones- comes at a high cost that limits the
applicability of existing techniques. In this paper, we introduce algorithms
able to handle high-dimensional SPD matrices by constructing a
lower-dimensional SPD manifold. To this end, we propose to model the mapping
from the high-dimensional SPD manifold to the low-dimensional one with an
orthonormal projection. This lets us formulate dimensionality reduction as the
problem of finding a projection that yields a low-dimensional manifold either
with maximum discriminative power in the supervised scenario, or with maximum
variance of the data in the unsupervised one. We show that learning can be
expressed as an optimization problem on a Grassmann manifold and discuss fast
solutions for special cases. Our evaluation on several classification tasks
evidences that our approach leads to a significant accuracy gain over
state-of-the-art methods.Comment: arXiv admin note: text overlap with arXiv:1407.112
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
GAGAN: Geometry-Aware Generative Adversarial Networks
Deep generative models learned through adversarial training have become
increasingly popular for their ability to generate naturalistic image textures.
However, aside from their texture, the visual appearance of objects is
significantly influenced by their shape geometry; information which is not
taken into account by existing generative models. This paper introduces the
Geometry-Aware Generative Adversarial Networks (GAGAN) for incorporating
geometric information into the image generation process. Specifically, in GAGAN
the generator samples latent variables from the probability space of a
statistical shape model. By mapping the output of the generator to a canonical
coordinate frame through a differentiable geometric transformation, we enforce
the geometry of the objects and add an implicit connection from the prior to
the generated object. Experimental results on face generation indicate that the
GAGAN can generate realistic images of faces with arbitrary facial attributes
such as facial expression, pose, and morphology, that are of better quality
than current GAN-based methods. Our method can be used to augment any existing
GAN architecture and improve the quality of the images generated
A Proximity-Aware Hierarchical Clustering of Faces
In this paper, we propose an unsupervised face clustering algorithm called
"Proximity-Aware Hierarchical Clustering" (PAHC) that exploits the local
structure of deep representations. In the proposed method, a similarity measure
between deep features is computed by evaluating linear SVM margins. SVMs are
trained using nearest neighbors of sample data, and thus do not require any
external training data. Clusters are then formed by thresholding the similarity
scores. We evaluate the clustering performance using three challenging
unconstrained face datasets, including Celebrity in Frontal-Profile (CFP),
IARPA JANUS Benchmark A (IJB-A), and JANUS Challenge Set 3 (JANUS CS3)
datasets. Experimental results demonstrate that the proposed approach can
achieve significant improvements over state-of-the-art methods. Moreover, we
also show that the proposed clustering algorithm can be applied to curate a set
of large-scale and noisy training dataset while maintaining sufficient amount
of images and their variations due to nuisance factors. The face verification
performance on JANUS CS3 improves significantly by finetuning a DCNN model with
the curated MS-Celeb-1M dataset which contains over three million face images
Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices
Data encoded as symmetric positive definite (SPD) matrices frequently arise
in many areas of computer vision and machine learning. While these matrices
form an open subset of the Euclidean space of symmetric matrices, viewing them
through the lens of non-Euclidean Riemannian geometry often turns out to be
better suited in capturing several desirable data properties. However,
formulating classical machine learning algorithms within such a geometry is
often non-trivial and computationally expensive. Inspired by the great success
of dictionary learning and sparse coding for vector-valued data, our goal in
this paper is to represent data in the form of SPD matrices as sparse conic
combinations of SPD atoms from a learned dictionary via a Riemannian geometric
approach. To that end, we formulate a novel Riemannian optimization objective
for dictionary learning and sparse coding in which the representation loss is
characterized via the affine invariant Riemannian metric. We also present a
computationally simple algorithm for optimizing our model. Experiments on
several computer vision datasets demonstrate superior classification and
retrieval performance using our approach when compared to sparse coding via
alternative non-Riemannian formulations
Deep Metric Learning with Angular Loss
The modern image search system requires semantic understanding of image, and
a key yet under-addressed problem is to learn a good metric for measuring the
similarity between images. While deep metric learning has yielded impressive
performance gains by extracting high level abstractions from image data, a
proper objective loss function becomes the central issue to boost the
performance. In this paper, we propose a novel angular loss, which takes angle
relationship into account, for learning better similarity metric. Whereas
previous metric learning methods focus on optimizing the similarity
(contrastive loss) or relative similarity (triplet loss) of image pairs, our
proposed method aims at constraining the angle at the negative point of triplet
triangles. Several favorable properties are observed when compared with
conventional methods. First, scale invariance is introduced, improving the
robustness of objective against feature variance. Second, a third-order
geometric constraint is inherently imposed, capturing additional local
structure of triplet triangles than contrastive loss or triplet loss. Third,
better convergence has been demonstrated by experiments on three publicly
available datasets.Comment: International Conference on Computer Vision 201
An Automatic System for Unconstrained Video-Based Face Recognition
Although deep learning approaches have achieved performance surpassing humans
for still image-based face recognition, unconstrained video-based face
recognition is still a challenging task due to large volume of data to be
processed and intra/inter-video variations on pose, illumination, occlusion,
scene, blur, video quality, etc. In this work, we consider challenging
scenarios for unconstrained video-based face recognition from multiple-shot
videos and surveillance videos with low-quality frames. To handle these
problems, we propose a robust and efficient system for unconstrained
video-based face recognition, which is composed of modules for face/fiducial
detection, face association, and face recognition. First, we use multi-scale
single-shot face detectors to efficiently localize faces in videos. The
detected faces are then grouped respectively through carefully designed face
association methods, especially for multi-shot videos. Finally, the faces are
recognized by the proposed face matcher based on an unsupervised subspace
learning approach and a subspace-to-subspace similarity metric. Extensive
experiments on challenging video datasets, such as Multiple Biometric Grand
Challenge (MBGC), Face and Ocular Challenge Series (FOCS), IARPA Janus
Surveillance Video Benchmark (IJB-S) for low-quality surveillance videos and
IARPA JANUS Benchmark B (IJB-B) for multiple-shot videos, demonstrate that the
proposed system can accurately detect and associate faces from unconstrained
videos and effectively learn robust and discriminative features for
recognition
- …