429 research outputs found
Learning Disentangled Representation with Mutual Information Maximization for Real-Time UAV Tracking
Efficiency has been a critical problem in UAV tracking due to limitations in
computation resources, battery capacity, and unmanned aerial vehicle maximum
load. Although discriminative correlation filters (DCF)-based trackers prevail
in this field for their favorable efficiency, some recently proposed
lightweight deep learning (DL)-based trackers using model compression
demonstrated quite remarkable CPU efficiency as well as precision.
Unfortunately, the model compression methods utilized by these works, though
simple, are still unable to achieve satisfying tracking precision with higher
compression rates. This paper aims to exploit disentangled representation
learning with mutual information maximization (DR-MIM) to further improve
DL-based trackers' precision and efficiency for UAV tracking. The proposed
disentangled representation separates the feature into an identity-related and
an identity-unrelated features. Only the latter is used, which enhances the
effectiveness of the feature representation for subsequent classification and
regression tasks. Extensive experiments on four UAV benchmarks, including
UAV123@10fps, DTB70, UAVDT and VisDrone2018, show that our DR-MIM tracker
significantly outperforms state-of-the-art UAV tracking methods
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization
The self-media era provides us tremendous high quality videos. Unfortunately,
frequent video copyright infringements are now seriously damaging the interests
and enthusiasm of video creators. Identifying infringing videos is therefore a
compelling task. Current state-of-the-art methods tend to simply feed
high-dimensional mixed video features into deep neural networks and count on
the networks to extract useful representations. Despite its simplicity, this
paradigm heavily relies on the original entangled features and lacks
constraints guaranteeing that useful task-relevant semantics are extracted from
the features.
In this paper, we seek to tackle the above challenges from two aspects: (1)
We propose to disentangle an original high-dimensional feature into multiple
sub-features, explicitly disentangling the feature into exclusive
lower-dimensional components. We expect the sub-features to encode
non-overlapping semantics of the original feature and remove redundant
information.
(2) On top of the disentangled sub-features, we further learn an auxiliary
feature to enhance the sub-features. We theoretically analyzed the mutual
information between the label and the disentangled features, arriving at a loss
that maximizes the extraction of task-relevant information from the original
feature.
Extensive experiments on two large-scale benchmark datasets (i.e., SVD and
VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale
SVD dataset and also sets the new state-of-the-art on the VCSL benchmark
dataset. Our code and model have been released at
https://github.com/yyyooooo/DMI/, hoping to contribute to the community.Comment: This paper is accepted by ACM MM 202
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
Disentangled Autoencoder for Cross-Stain Feature Extraction in Pathology Image Analysis
A novel deep autoencoder architecture is proposed for the analysis of histopathology images. Its purpose is to produce a disentangled latent representation in which the structure and colour information are confined to different subspaces so that stain-independent models may be learned. For this, we introduce two constraints on the representation which are implemented as a classifier and an adversarial discriminator. We show how they can be used for learning a latent representation across haematoxylin-eosin and a number of immune stains. Finally, we demonstrate the utility of the proposed representation in the context of matching image patches for registration applications and for learning a bag of visual words for whole slide image summarization
Disentangled Representation Learning
Disentangled Representation Learning (DRL) aims to learn a model capable of
identifying and disentangling the underlying factors hidden in the observable
data in representation form. The process of separating underlying factors of
variation into variables with semantic meaning benefits in learning explainable
representations of data, which imitates the meaningful understanding process of
humans when observing an object or relation. As a general learning strategy,
DRL has demonstrated its power in improving the model explainability,
controlability, robustness, as well as generalization capacity in a wide range
of scenarios such as computer vision, natural language processing, data mining
etc. In this article, we comprehensively review DRL from various aspects
including motivations, definitions, methodologies, evaluations, applications
and model designs. We discuss works on DRL based on two well-recognized
definitions, i.e., Intuitive Definition and Group Theory Definition. We further
categorize the methodologies for DRL into four groups, i.e., Traditional
Statistical Approaches, Variational Auto-encoder Based Approaches, Generative
Adversarial Networks Based Approaches, Hierarchical Approaches and Other
Approaches. We also analyze principles to design different DRL models that may
benefit different tasks in practical applications. Finally, we point out
challenges in DRL as well as potential research directions deserving future
investigations. We believe this work may provide insights for promoting the DRL
research in the community.Comment: 22 pages,9 figure
- …