2 research outputs found
Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation
Currently, the divergence in distributions of design and operational data,
and large computational complexity are limiting factors in the adoption of CNNs
in real-world applications. For instance, person re-identification systems
typically rely on a distributed set of cameras, where each camera has different
capture conditions. This can translate to a considerable shift between source
(e.g. lab setting) and target (e.g. operational camera) domains. Given the cost
of annotating image data captured for fine-tuning in each target domain,
unsupervised domain adaptation (UDA) has become a popular approach to adapt
CNNs. Moreover, state-of-the-art deep learning models that provide a high level
of accuracy often rely on architectures that are too complex for real-time
applications. Although several compression and UDA approaches have recently
been proposed to overcome these limitations, they do not allow optimizing a CNN
to simultaneously address both. In this paper, we propose an unexplored
direction -- the joint optimization of CNNs to provide a compressed model that
is adapted to perform well for a given target domain. In particular, the
proposed approach performs unsupervised knowledge distillation (KD) from a
complex teacher model to a compact student model, by leveraging both source and
target data. It also improves upon existing UDA techniques by progressively
teaching the student about domain-invariant features, instead of directly
adapting a compact model on target domain data. Our method is compared against
state-of-the-art compression and UDA techniques, using two popular
classification datasets for UDA -- Office31 and ImageClef-DA. In both datasets,
results indicate that our method can achieve the highest level of accuracy
while requiring a comparable or lower time complexity.Comment: Accepted to WCCI/IJCNN 202
Exploiting Prunability for Person Re-Identification
Recent years have witnessed a substantial increase in the deep learning
(DL)architectures proposed for visual recognition tasks like person
re-identification,where individuals must be recognized over multiple
distributed cameras. Althoughthese architectures have greatly improved the
state-of-the-art accuracy, thecomputational complexity of the CNNs commonly
used for feature extractionremains an issue, hindering their deployment on
platforms with limited resources,or in applications with real-time constraints.
There is an obvious advantage toaccelerating and compressing DL models without
significantly decreasing theiraccuracy. However, the source (pruning) domain
differs from operational (target)domains, and the domain shift between image
data captured with differentnon-overlapping camera viewpoints leads to lower
recognition accuracy. In thispaper, we investigate the prunability of these
architectures under different designscenarios. This paper first revisits
pruning techniques that are suitable forreducing the computational complexity
of deep CNN networks applied to personre-identification. Then, these techniques
are analysed according to their pruningcriteria and strategy, and according to
different scenarios for exploiting pruningmethods to fine-tuning networks to
target domains. Experimental resultsobtained using DL models with ResNet
feature extractors, and multiplebenchmarks re-identification datasets, indicate
that pruning can considerablyreduce network complexity while maintaining a high
level of accuracy. Inscenarios where pruning is performed with large
pre-training or fine-tuningdatasets, the number of FLOPS required by ResNet
architectures is reduced byhalf, while maintaining a comparable rank-1 accuracy
(within 1% of the originalmodel). Pruning while training a larger CNNs can also
provide a significantlybetter performance than fine-tuning smaller ones.Comment: Accepted for EURASIP Journal on Image and Video Processin