6,904 research outputs found
Deep Adversarial Inconsistent Cognitive Sampling for Multi-view Progressive Subspace Clustering
Deep multi-view clustering methods have achieved remarkable performance.
However, all of them failed to consider the difficulty labels (uncertainty of
ground-truth for training samples) over multi-view samples, which may result
into a nonideal clustering network for getting stuck into poor local optima
during training process; worse still, the difficulty labels from multi-view
samples are always inconsistent, such fact makes it even more challenging to
handle. In this paper, we propose a novel Deep Adversarial Inconsistent
Cognitive Sampling (DAICS) method for multi-view progressive subspace
clustering. A multiview binary classification (easy or difficult) loss and a
feature similarity loss are proposed to jointly learn a binary classifier and a
deep consistent feature embedding network, throughout an adversarial minimax
game over difficulty labels of multiview consistent samples. We develop a
multi-view cognitive sampling strategy to select the input samples from easy to
difficult for multi-view clustering network training. However, the
distributions of easy and difficult samples are mixed together, hence not
trivial to achieve the goal. To resolve it, we define a sampling probability
with theoretical guarantee. Based on that, a golden section mechanism is
further designed to generate a sample set boundary to progressively select the
samples with varied difficulty labels via a gate unit, which is utilized to
jointly learn a multi-view common progressive subspace and clustering network
for more efficient clustering. Experimental results on four real-world datasets
demonstrate the superiority of DAICS over the state-of-the-art methods
TSViz: Demystification of Deep Learning Models for Time-Series Analysis
This paper presents a novel framework for demystification of convolutional
deep learning models for time-series analysis. This is a step towards making
informed/explainable decisions in the domain of time-series, powered by deep
learning. There have been numerous efforts to increase the interpretability of
image-centric deep neural network models, where the learned features are more
intuitive to visualize. Visualization in time-series domain is much more
complicated as there is no direct interpretation of the filters and inputs as
compared to the image modality. In addition, little or no concentration has
been devoted for the development of such tools in the domain of time-series in
the past. TSViz provides possibilities to explore and analyze a network from
different dimensions at different levels of abstraction which includes
identification of parts of the input that were responsible for a prediction
(including per filter saliency), importance of different filters present in the
network for a particular prediction, notion of diversity present in the network
through filter clustering, understanding of the main sources of variation
learnt by the network through inverse optimization, and analysis of the
network's robustness against adversarial noise. As a sanity check for the
computed influence values, we demonstrate results regarding pruning of neural
networks based on the computed influence information. These representations
allow to understand the network features so that the acceptability of deep
networks for time-series data can be enhanced. This is extremely important in
domains like finance, industry 4.0, self-driving cars, health-care,
counter-terrorism etc., where reasons for reaching a particular prediction are
equally important as the prediction itself. We assess the proposed framework
for interpretability with a set of desirable properties essential for any
method.Comment: 7 Pages (6 + 1 for references), 7 figure
Learning Graph Embedding with Adversarial Training Methods
Graph embedding aims to transfer a graph into vectors to facilitate
subsequent graph analytics tasks like link prediction and graph clustering.
Most approaches on graph embedding focus on preserving the graph structure or
minimizing the reconstruction errors for graph data. They have mostly
overlooked the embedding distribution of the latent codes, which unfortunately
may lead to inferior representation in many cases. In this paper, we present a
novel adversarially regularized framework for graph embedding. By employing the
graph convolutional network as an encoder, our framework embeds the topological
information and node content into a vector representation, from which a graph
decoder is further built to reconstruct the input graph. The adversarial
training principle is applied to enforce our latent codes to match a prior
Gaussian or Uniform distribution. Based on this framework, we derive two
variants of adversarial models, the adversarially regularized graph autoencoder
(ARGA) and its variational version, adversarially regularized variational graph
autoencoder (ARVGA), to learn the graph embedding effectively. We also exploit
other potential variations of ARGA and ARVGA to get a deeper understanding on
our designs. Experimental results compared among twelve algorithms for link
prediction and twenty algorithms for graph clustering validate our solutions.Comment: To appear in IEEE Transactions on Cybernetics. arXiv admin note:
substantial text overlap with arXiv:1802.0440
Sparse Label Smoothing Regularization for Person Re-Identification
Person re-identification (re-id) is a cross-camera retrieval task which
establishes a correspondence between images of a person from multiple cameras.
Deep Learning methods have been successfully applied to this problem and have
achieved impressive results. However, these methods require a large amount of
labeled training data. Currently labeled datasets in person re-id are limited
in their scale and manual acquisition of such large-scale datasets from
surveillance cameras is a tedious and labor-intensive task. In this paper, we
propose a framework that performs intelligent data augmentation and assigns
partial smoothing label to generated data. Our approach first exploits the
clustering property of existing person re-id datasets to create groups of
similar objects that model cross-view variations. Each group is then used to
generate realistic images through adversarial training. Our aim is to emphasize
feature similarity between generated samples and the original samples. Finally,
we assign a non-uniform label distribution to the generated samples and define
a regularized loss function for training. The proposed approach tackles two
problems (1) how to efficiently use the generated data and (2) how to address
the over-smoothness problem found in current regularization methods. Extensive
experiments on four larges cale datasets show that our regularization method
significantly improves the Re-ID accuracy compared to existing methods.Comment: 13 pages, 6 figure
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Person re-identification (\textit{re-id}) refers to matching pedestrians
across disjoint yet non-overlapping camera views. The most effective way to
match these pedestrians undertaking significant visual variations is to seek
reliably invariant features that can describe the person of interest
faithfully. Most of existing methods are presented in a supervised manner to
produce discriminative features by relying on labeled paired images in
correspondence. However, annotating pair-wise images is prohibitively expensive
in labors, and thus not practical in large-scale networked cameras. Moreover,
seeking comparable representations across camera views demands a flexible model
to address the complex distributions of images. In this work, we study the
co-occurrence statistic patterns between pairs of images, and propose to
crossing Generative Adversarial Network (Cross-GAN) for learning a joint
distribution for cross-image representations in a unsupervised manner. Given a
pair of person images, the proposed model consists of the variational
auto-encoder to encode the pair into respective latent variables, a proposed
cross-view alignment to reduce the view disparity, and an adversarial layer to
seek the joint distribution of latent representations. The learned latent
representations are well-aligned to reflect the co-occurrence patterns of
paired images. We empirically evaluate the proposed model against challenging
datasets, and our results show the importance of joint invariant features in
improving matching rates of person re-id with comparison to semi/unsupervised
state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by
other author
Deep Spectral Clustering using Dual Autoencoder Network
The clustering methods have recently absorbed even-increasing attention in
learning and vision. Deep clustering combines embedding and clustering together
to obtain optimal embedding subspace for clustering, which can be more
effective compared with conventional clustering methods. In this paper, we
propose a joint learning framework for discriminative embedding and spectral
clustering. We first devise a dual autoencoder network, which enforces the
reconstruction constraint for the latent representations and their noisy
versions, to embed the inputs into a latent space for clustering. As such the
learned latent representations can be more robust to noise. Then the mutual
information estimation is utilized to provide more discriminative information
from the inputs. Furthermore, a deep spectral clustering method is applied to
embed the latent representations into the eigenspace and subsequently clusters
them, which can fully exploit the relationship between inputs to achieve
optimal clustering results. Experimental results on benchmark datasets show
that our method can significantly outperform state-of-the-art clustering
approaches
Deep Learning in Information Security
Machine learning has a long tradition of helping to solve complex information
security problems that are difficult to solve manually. Machine learning
techniques learn models from data representations to solve a task. These data
representations are hand-crafted by domain experts. Deep Learning is a
sub-field of machine learning, which uses models that are composed of multiple
layers. Consequently, representations that are used to solve a task are learned
from the data instead of being manually designed.
In this survey, we study the use of DL techniques within the domain of
information security. We systematically reviewed 77 papers and presented them
from a data-centric perspective. This data-centric perspective reflects one of
the most crucial advantages of DL techniques -- domain independence. If
DL-methods succeed to solve problems on a data type in one domain, they most
likely will also succeed on similar data from another domain. Other advantages
of DL methods are unrivaled scalability and efficiency, both regarding the
number of examples that can be analyzed as well as with respect of
dimensionality of the input data. DL methods generally are capable of achieving
high-performance and generalize well.
However, information security is a domain with unique requirements and
challenges. Based on an analysis of our reviewed papers, we point out
shortcomings of DL-methods to those requirements and discuss further research
opportunities
Geodesic Clustering in Deep Generative Models
Deep generative models are tremendously successful in learning
low-dimensional latent representations that well-describe the data. These
representations, however, tend to much distort relationships between points,
i.e. pairwise distances tend to not reflect semantic similarities well. This
renders unsupervised tasks, such as clustering, difficult when working with the
latent representations. We demonstrate that taking the geometry of the
generative model into account is sufficient to make simple clustering
algorithms work well over latent representations. Leaning on the recent finding
that deep generative models constitute stochastically immersed Riemannian
manifolds, we propose an efficient algorithm for computing geodesics (shortest
paths) and computing distances in the latent space, while taking its distortion
into account. We further propose a new architecture for modeling uncertainty in
variational autoencoders, which is essential for understanding the geometry of
deep generative models. Experiments show that the geodesic distance is very
likely to reflect the internal structure of the data
Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks
Designing a logo for a new brand is a lengthy and tedious back-and-forth
process between a designer and a client. In this paper we explore to what
extent machine learning can solve the creative task of the designer. For this,
we build a dataset -- LLD -- of 600k+ logos crawled from the world wide web.
Training Generative Adversarial Networks (GANs) for logo synthesis on such
multi-modal data is not straightforward and results in mode collapse for some
state-of-the-art methods. We propose the use of synthetic labels obtained
through clustering to disentangle and stabilize GAN training. We are able to
generate a high diversity of plausible logos and we demonstrate latent space
exploration techniques to ease the logo design task in an interactive manner.
Moreover, we validate the proposed clustered GAN training on CIFAR 10,
achieving state-of-the-art Inception scores when using synthetic labels
obtained via clustering the features of an ImageNet classifier. GANs can cope
with multi-modal data by means of synthetic labels achieved through clustering,
and our results show the creative potential of such techniques for logo
synthesis and manipulation. Our dataset and models will be made publicly
available at https://data.vision.ee.ethz.ch/cvl/lld/
Learning to Align Multi-Camera Domains using Part-Aware Clustering for Unsupervised Video Person Re-Identification
Most video person re-identification (re-ID) methods are mainly based on
supervised learning, which requires cross-camera ID labeling. Since the cost of
labeling increases dramatically as the number of cameras increases, it is
difficult to apply the re-identification algorithm to a large camera network.
In this paper, we address the scalability issue by presenting deep
representation learning without ID information across multiple cameras.
Technically, we train neural networks to generate both ID-discriminative and
camera-invariant features. To achieve the ID discrimination ability of the
embedding features, we maximize feature distances between different person IDs
within a camera by using a metric learning approach. At the same time,
considering each camera as a different domain, we apply adversarial learning
across multiple camera domains for generating camera-invariant features. We
also propose a part-aware adaptation module, which effectively performs
multi-camera domain invariant feature learning in different spatial regions. We
carry out comprehensive experiments on three public re-ID datasets (i.e.,
PRID-2011, iLIDS-VID, and MARS). Our method outperforms state-of-the-art
methods by a large margin of about 20\% in terms of rank-1 accuracy on the
large-scale MARS dataset
- …