2 research outputs found
Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video-Based Face Recognition
The scalability and complexity of deep learning models remains a key issue in
many of visual recognition applications like, e.g., video surveillance, where
fine tuning with labeled image data from each new camera is required to reduce
the domain shift between videos captured from the source domain, e.g., a
laboratory setting, and the target domain, i.e, an operational environment. In
many video surveillance applications, like face recognition (FR) and person
re-identification, a pair-wise matcher is used to assign a query image captured
using a video camera to the corresponding reference images in a gallery. The
different configurations and operational conditions of video cameras can
introduce significant shifts in the pair-wise distance distributions, resulting
in degraded recognition performance for new cameras. In this paper, a new deep
domain adaptation (DA) method is proposed to adapt the CNN embedding of a
Siamese network using unlabeled tracklets captured with a new video cameras. To
this end, a dual-triplet loss is introduced for metric learning, where two
triplets are constructed using video data from a source camera, and a new
target camera. In order to constitute the dual triplets, a mutual-supervised
learning approach is introduced where the source camera acts as a teacher,
providing the target camera with an initial embedding. Then, the student relies
on the teacher to iteratively label the positive and negative pairs collected
during, e.g., initial camera calibration. Both source and target embeddings
continue to simultaneously learn such that their pair-wise distance
distributions become aligned. For validation, the proposed metric learning
technique is used to train deep Siamese networks under different training
scenarios, and is compared to state-of-the-art techniques for still-to-video FR
on the COX-S2V and a private video-based FR dataset.Comment: Submitted too IJCNN202
Unsupervised Multi-Target Domain Adaptation Through Knowledge Distillation
Unsupervised domain adaptation (UDA) seeks to alleviate the problem of domain
shift between the distribution of unlabeled data from the target domain w.r.t.
labeled data from the source domain. While the single-target UDA scenario is
well studied in the literature, Multi-Target Domain Adaptation (MTDA) remains
largely unexplored despite its practical importance, e.g., in multi-camera
video-surveillance applications. The MTDA problem can be addressed by adapting
one specialized model per target domain, although this solution is too costly
in many real-world applications. Blending multiple targets for MTDA has been
proposed, yet this solution may lead to a reduction in model specificity and
accuracy. In this paper, we propose a novel unsupervised MTDA approach to train
a CNN that can generalize well across multiple target domains. Our
Multi-Teacher MTDA (MT-MTDA) method relies on multi-teacher knowledge
distillation (KD) to iteratively distill target domain knowledge from multiple
teachers to a common student. The KD process is performed in a progressive
manner, where the student is trained by each teacher on how to perform UDA for
a specific target, instead of directly learning domain adapted features.
Finally, instead of combining the knowledge from each teacher, MT-MTDA
alternates between teachers that distill knowledge, thereby preserving the
specificity of each target (teacher) when learning to adapt to the student.
MT-MTDA is compared against state-of-the-art methods on several challenging UDA
benchmarks, and empirical results show that our proposed model can provide a
considerably higher level of accuracy across multiple target domains. Our code
is available at: https://github.com/LIVIAETS/MT-MTDAComment: Accepted for WACV202