4,245 research outputs found
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification
Person re-identification (reID) is an important task that requires to
retrieve a person's images from an image dataset, given one image of the person
of interest. For learning robust person features, the pose variation of person
images is one of the key challenges. Existing works targeting the problem
either perform human alignment, or learn human-region-based representations.
Extra pose information and computational cost is generally required for
inference. To solve this issue, a Feature Distilling Generative Adversarial
Network (FD-GAN) is proposed for learning identity-related and pose-unrelated
representations. It is a novel framework based on a Siamese structure with
multiple novel discriminators on human poses and identities. In addition to the
discriminators, a novel same-pose loss is also integrated, which requires
appearance of a same person's generated images to be similar. After learning
pose-unrelated person features with pose guidance, no auxiliary pose
information and additional computational cost is required during testing. Our
proposed FD-GAN achieves state-of-the-art performance on three person reID
datasets, which demonstrates that the effectiveness and robust feature
distilling capability of the proposed FD-GAN.Comment: Accepted in Proceedings of 32nd Conference on Neural Information
Processing Systems (NeurIPS 2018). Code available:
https://github.com/yxgeee/FD-GA
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Person re-identification (\textit{re-id}) refers to matching pedestrians
across disjoint yet non-overlapping camera views. The most effective way to
match these pedestrians undertaking significant visual variations is to seek
reliably invariant features that can describe the person of interest
faithfully. Most of existing methods are presented in a supervised manner to
produce discriminative features by relying on labeled paired images in
correspondence. However, annotating pair-wise images is prohibitively expensive
in labors, and thus not practical in large-scale networked cameras. Moreover,
seeking comparable representations across camera views demands a flexible model
to address the complex distributions of images. In this work, we study the
co-occurrence statistic patterns between pairs of images, and propose to
crossing Generative Adversarial Network (Cross-GAN) for learning a joint
distribution for cross-image representations in a unsupervised manner. Given a
pair of person images, the proposed model consists of the variational
auto-encoder to encode the pair into respective latent variables, a proposed
cross-view alignment to reduce the view disparity, and an adversarial layer to
seek the joint distribution of latent representations. The learned latent
representations are well-aligned to reflect the co-occurrence patterns of
paired images. We empirically evaluate the proposed model against challenging
datasets, and our results show the importance of joint invariant features in
improving matching rates of person re-id with comparison to semi/unsupervised
state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by
other author
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
M2M-GAN: Many-to-Many Generative Adversarial Transfer Learning for Person Re-Identification
Cross-domain transfer learning (CDTL) is an extremely challenging task for
the person re-identification (ReID). Given a source domain with annotations and
a target domain without annotations, CDTL seeks an effective method to transfer
the knowledge from the source domain to the target domain. However, such a
simple two-domain transfer learning method is unavailable for the person ReID
in that the source/target domain consists of several sub-domains, e.g.,
camera-based sub-domains. To address this intractable problem, we propose a
novel Many-to-Many Generative Adversarial Transfer Learning method (M2M-GAN)
that takes multiple source sub-domains and multiple target sub-domains into
consideration and performs each sub-domain transferring mapping from the source
domain to the target domain in a unified optimization process. The proposed
method first translates the image styles of source sub-domains into that of
target sub-domains, and then performs the supervised learning by using the
transferred images and the corresponding annotations in source domain. As the
gap is reduced, M2M-GAN achieves a promising result for the cross-domain person
ReID. Experimental results on three benchmark datasets Market-1501,
DukeMTMC-reID and MSMT17 show the effectiveness of our M2M-GAN
AnonymousNet: Natural Face De-Identification with Measurable Privacy
With billions of personal images being generated from social media and
cameras of all sorts on a daily basis, security and privacy are unprecedentedly
challenged. Although extensive attempts have been made, existing face image
de-identification techniques are either insufficient in photo-reality or
incapable of balancing privacy and usability qualitatively and quantitatively,
i.e., they fail to answer counterfactual questions such as "is it private
now?", "how private is it?", and "can it be more private?" In this paper, we
propose a novel framework called AnonymousNet, with an effort to address these
issues systematically, balance usability, and enhance privacy in a natural and
measurable manner. The framework encompasses four stages: facial attribute
estimation, privacy-metric-oriented face obfuscation, directed natural image
synthesis, and adversarial perturbation. Not only do we achieve the
state-of-the-arts in terms of image quality and attribute prediction accuracy,
we are also the first to show that facial privacy is measurable, can be
factorized, and accordingly be manipulated in a photo-realistic fashion to
fulfill different requirements and application scenarios. Experiments further
demonstrate the effectiveness of the proposed framework.Comment: CVPR-19 Workshop on Computer Vision: Challenges and Opportunities for
Privacy and Security (CV-COPS 2019
Domain Adaptive Person Re-Identification via Camera Style Generation and Label Propagation
Unsupervised domain adaptation in person re-identification resorts to labeled
source data to promote the model training on target domain, facing the dilemmas
caused by large domain shift and large camera variations. The non-overlapping
labels challenge that source domain and target domain have entirely different
persons further increases the re-identification difficulty. In this paper, we
propose a novel algorithm to narrow such domain gaps. We derive a camera style
adaptation framework to learn the style-based mappings between different camera
views, from the target domain to the source domain, and then we can transfer
the identity-based distribution from the source domain to the target domain on
the camera level. To overcome the non-overlapping labels challenge and guide
the person re-identification model to narrow the gap further, an efficient and
effective soft-labeling method is proposed to mine the intrinsic local
structure of the target domain through building the connection between
GAN-translated source domain and the target domain. Experiment results
conducted on real benchmark datasets indicate that our method gets
state-of-the-art results
Sparse Label Smoothing Regularization for Person Re-Identification
Person re-identification (re-id) is a cross-camera retrieval task which
establishes a correspondence between images of a person from multiple cameras.
Deep Learning methods have been successfully applied to this problem and have
achieved impressive results. However, these methods require a large amount of
labeled training data. Currently labeled datasets in person re-id are limited
in their scale and manual acquisition of such large-scale datasets from
surveillance cameras is a tedious and labor-intensive task. In this paper, we
propose a framework that performs intelligent data augmentation and assigns
partial smoothing label to generated data. Our approach first exploits the
clustering property of existing person re-id datasets to create groups of
similar objects that model cross-view variations. Each group is then used to
generate realistic images through adversarial training. Our aim is to emphasize
feature similarity between generated samples and the original samples. Finally,
we assign a non-uniform label distribution to the generated samples and define
a regularized loss function for training. The proposed approach tackles two
problems (1) how to efficiently use the generated data and (2) how to address
the over-smoothness problem found in current regularization methods. Extensive
experiments on four larges cale datasets show that our regularization method
significantly improves the Re-ID accuracy compared to existing methods.Comment: 13 pages, 6 figure
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Although the performance of person Re-Identification (ReID) has been
significantly boosted, many challenging issues in real scenarios have not been
fully investigated, e.g., the complex scenes and lighting variations, viewpoint
and pose changes, and the large number of identities in a camera network. To
facilitate the research towards conquering those issues, this paper contributes
a new dataset called MSMT17 with many important features, e.g., 1) the raw
videos are taken by an 15-camera network deployed in both indoor and outdoor
scenes, 2) the videos cover a long period of time and present complex lighting
variations, and 3) it contains currently the largest number of annotated
identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe
that, domain gap commonly exists between datasets, which essentially causes
severe performance drop when training and testing on different datasets. This
results in that available training data cannot be effectively leveraged for new
testing domains. To relieve the expensive costs of annotating new training
samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to
bridge the domain gap. Comprehensive experiments show that the domain gap could
be substantially narrowed-down by the PTGAN.Comment: 10 pages, 9 figures; accepted in CVPR 201
Joint Discriminative and Generative Learning for Person Re-identification
Person re-identification (re-id) remains challenging due to significant
intra-class variations across different cameras. Recently, there has been a
growing interest in using generative models to augment training data and
enhance the invariance to input changes. The generative pipelines in existing
methods, however, stay relatively separate from the discriminative re-id
learning stages. Accordingly, re-id models are often trained in a
straightforward manner on the generated data. In this paper, we seek to improve
learned re-id embeddings by better leveraging the generated data. To this end,
we propose a joint learning framework that couples re-id learning and data
generation end-to-end. Our model involves a generative module that separately
encodes each person into an appearance code and a structure code, and a
discriminative module that shares the appearance encoder with the generative
module. By switching the appearance or structure codes, the generative module
is able to generate high-quality cross-id composed images, which are online fed
back to the appearance encoder and used to improve the discriminative module.
The proposed joint learning framework renders significant improvement over the
baseline without using generated data, leading to the state-of-the-art
performance on several benchmark datasets.Comment: CVPR 2019 (Oral
ReshapeGAN: Object Reshaping by Providing A Single Reference Image
The aim of this work is learning to reshape the object in an input image to
an arbitrary new shape, by just simply providing a single reference image with
an object instance in the desired shape. We propose a new Generative
Adversarial Network (GAN) architecture for such an object reshaping problem,
named ReshapeGAN. The network can be tailored for handling all kinds of problem
settings, including both within-domain (or single-dataset) reshaping and
cross-domain (typically across mutiple datasets) reshaping, with paired or
unpaired training data. The appearance of the input object is preserved in all
cases, and thus it is still identifiable after reshaping, which has never been
achieved as far as we are aware. We present the tailored models of the proposed
ReshapeGAN for all the problem settings, and have them tested on 8 kinds of
reshaping tasks with 13 different datasets, demonstrating the ability of
ReshapeGAN on generating convincing and superior results for object reshaping.
To the best of our knowledge, we are the first to be able to make one GAN
framework work on all such object reshaping tasks, especially the cross-domain
tasks on handling multiple diverse datasets. We present here both ablation
studies on our proposed ReshapeGAN models and comparisons with the
state-of-the-art models when they are made comparable, using all kinds of
applicable metrics that we are aware of.Comment: 25 pages, 23 figure
- …