7,786 research outputs found
MoCoGAN: Decomposing Motion and Content for Video Generation
Visual signals in a video can be divided into content and motion. While
content specifies which objects are in the video, motion describes their
dynamics. Based on this prior, we propose the Motion and Content decomposed
Generative Adversarial Network (MoCoGAN) framework for video generation. The
proposed framework generates a video by mapping a sequence of random vectors to
a sequence of video frames. Each random vector consists of a content part and a
motion part. While the content part is kept fixed, the motion part is realized
as a stochastic process. To learn motion and content decomposition in an
unsupervised manner, we introduce a novel adversarial learning scheme utilizing
both image and video discriminators. Extensive experimental results on several
challenging datasets with qualitative and quantitative comparison to the
state-of-the-art approaches, verify effectiveness of the proposed framework. In
addition, we show that MoCoGAN allows one to generate videos with same content
but different motion as well as videos with different content and same motion
Survey of state-of-the-art mixed data clustering algorithms
Mixed data comprises both numeric and categorical features, and mixed
datasets occur frequently in many domains, such as health, finance, and
marketing. Clustering is often applied to mixed datasets to find structures and
to group similar objects for further analysis. However, clustering mixed data
is challenging because it is difficult to directly apply mathematical
operations, such as summation or averaging, to the feature values of these
datasets. In this paper, we present a taxonomy for the study of mixed data
clustering algorithms by identifying five major research themes. We then
present a state-of-the-art review of the research works within each research
theme. We analyze the strengths and weaknesses of these methods with pointers
for future research directions. Lastly, we present an in-depth analysis of the
overall challenges in this field, highlight open research questions and discuss
guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference
Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples
State-of-the-art neural networks are vulnerable to adversarial examples; they
can easily misclassify inputs that are imperceptibly different than their
training and test data. In this work, we establish that the use of
cross-entropy loss function and the low-rank features of the training data have
responsibility for the existence of these inputs. Based on this observation, we
suggest that addressing adversarial examples requires rethinking the use of
cross-entropy loss function and looking for an alternative that is more suited
for minimization with low-rank features. In this direction, we present a
training scheme called differential training, which uses a loss function
defined on the differences between the features of points from opposite
classes. We show that differential training can ensure a large margin between
the decision boundary of the neural network and the points in the training
dataset. This larger margin increases the amount of perturbation needed to flip
the prediction of the classifier and makes it harder to find an adversarial
example with small perturbations. We test differential training on a binary
classification task with CIFAR-10 dataset and demonstrate that it radically
reduces the ratio of images for which an adversarial example could be found --
not only in the training dataset, but in the test dataset as well
Diverse feature visualizations reveal invariances in early layers of deep neural networks
Visualizing features in deep neural networks (DNNs) can help understanding
their computations. Many previous studies aimed to visualize the selectivity of
individual units by finding meaningful images that maximize their activation.
However, comparably little attention has been paid to visualizing to what image
transformations units in DNNs are invariant. Here we propose a method to
discover invariances in the responses of hidden layer units of deep neural
networks. Our approach is based on simultaneously searching for a batch of
images that strongly activate a unit while at the same time being as distinct
from each other as possible. We find that even early convolutional layers in
VGG-19 exhibit various forms of response invariance: near-perfect phase
invariance in some units and invariance to local diffeomorphic transformations
in others. At the same time, we uncover representational differences with
ResNet-50 in its corresponding layers. We conclude that invariance
transformations are a major computational component learned by DNNs and we
provide a systematic method to study them.Comment: Accepted for ECCV 201
Deep Forest
Current deep learning models are mostly build upon neural networks, i.e.,
multiple layers of parameterized differentiable nonlinear modules that can be
trained by backpropagation. In this paper, we explore the possibility of
building deep models based on non-differentiable modules. We conjecture that
the mystery behind the success of deep neural networks owes much to three
characteristics, i.e., layer-by-layer processing, in-model feature
transformation and sufficient model complexity. We propose the gcForest
approach, which generates \textit{deep forest} holding these characteristics.
This is a decision tree ensemble approach, with much less hyper-parameters than
deep neural networks, and its model complexity can be automatically determined
in a data-dependent way. Experiments show that its performance is quite robust
to hyper-parameter settings, such that in most cases, even across different
data from different domains, it is able to get excellent performance by using
the same default setting. This study opens the door of deep learning based on
non-differentiable modules, and exhibits the possibility of constructing deep
models without using backpropagation
A Comprehensive Survey on Cross-modal Retrieval
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.Comment: 20 pages, 11 figures, 9 table
Deep Cross Polarimetric Thermal-to-visible Face Recognition
In this paper, we present a deep coupled learning frame- work to address the
problem of matching polarimetric ther- mal face photos against a gallery of
visible faces. Polariza- tion state information of thermal faces provides the
miss- ing textural and geometrics details in the thermal face im- agery which
exist in visible spectrum. we propose a coupled deep neural network
architecture which leverages relatively large visible and thermal datasets to
overcome the problem of overfitting and eventually we train it by a
polarimetric thermal face dataset which is the first of its kind. The pro-
posed architecture is able to make full use of the polari- metric thermal
information to train a deep model compared to the conventional shallow
thermal-to-visible face recogni- tion methods. Proposed coupled deep neural
network also finds global discriminative features in a nonlinear embed- ding
space to relate the polarimetric thermal faces to their corresponding visible
faces. The results show the superior- ity of our method compared to the
state-of-the-art models in cross thermal-to-visible face recognition
algorithms
Where Is My Puppy? Retrieving Lost Dogs by Facial Features
A pet that goes missing is among many people's worst fears: a moment of
distraction is enough for a dog or a cat wandering off from home. Some measures
help matching lost animals to their owners; but automated visual recognition is
one that - although convenient, highly available, and low-cost - is
surprisingly overlooked. In this paper, we inaugurate that promising avenue by
pursuing face recognition for dogs. We contrast four ready-to-use human facial
recognizers (EigenFaces, FisherFaces, LBPH, and a Sparse method) to two
original solutions based upon convolutional neural networks: BARK (inspired in
architecture-optimized networks employed for human facial recognition) and WOOF
(based upon off-the-shelf OverFeat features). Human facial recognizers perform
poorly for dogs (up to 60.5% accuracy), showing that dog facial recognition is
not a trivial extension of human facial recognition. The convolutional network
solutions work much better, with BARK attaining up to 81.1% accuracy, and WOOF,
89.4%. The tests were conducted in two datasets: Flickr-dog, with 42 dogs of
two breeds (pugs and huskies); and Snoopybook, with 18 mongrel dogs.Comment: 17 pages, 8 figures, 1 table, Multimedia Tools and Application
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
Exponential Discriminative Metric Embedding in Deep Learning
With the remarkable success achieved by the Convolutional Neural Networks
(CNNs) in object recognition recently, deep learning is being widely used in
the computer vision community. Deep Metric Learning (DML), integrating deep
learning with conventional metric learning, has set new records in many fields,
especially in classification task. In this paper, we propose a replicable DML
method, called Include and Exclude (IE) loss, to force the distance between a
sample and its designated class center away from the mean distance of this
sample to other class centers with a large margin in the exponential feature
projection space. With the supervision of IE loss, we can train CNNs to enhance
the intra-class compactness and inter-class separability, leading to great
improvements on several public datasets ranging from object recognition to face
verification. We conduct a comparative study of our algorithm with several
typical DML methods on three kinds of networks with different capacity.
Extensive experiments on three object recognition datasets and two face
recognition datasets demonstrate that IE loss is always superior to other
mainstream DML methods and approach the state-of-the-art results
- …