3 research outputs found
Towards Robust Video Instance Segmentation with Temporal-Aware Transformer
Most existing transformer based video instance segmentation methods extract
per frame features independently, hence it is challenging to solve the
appearance deformation problem. In this paper, we observe the temporal
information is important as well and we propose TAFormer to aggregate
spatio-temporal features both in transformer encoder and decoder. Specifically,
in transformer encoder, we propose a novel spatio-temporal joint multi-scale
deformable attention module which dynamically integrates the spatial and
temporal information to obtain enriched spatio-temporal features. In
transformer decoder, we introduce a temporal self-attention module to enhance
the frame level box queries with the temporal relation. Moreover, TAFormer
adopts an instance level contrastive loss to increase the discriminability of
instance query embeddings. Therefore the tracking error caused by visually
similar instances can be decreased. Experimental results show that TAFormer
effectively leverages the spatial and temporal information to obtain
context-aware feature representation and outperforms state-of-the-art methods
Cluster Contrast for Unsupervised Person Re-Identification
State-of-the-art unsupervised re-ID methods train the neural networks using a
memory-based non-parametric softmax loss. Instance feature vectors stored in
memory are assigned pseudo-labels by clustering and updated at instance level.
However, the varying cluster sizes leads to inconsistency in the updating
progress of each cluster. To solve this problem, we present Cluster Contrast
which stores feature vectors and computes contrast loss at the cluster level.
Our approach employs a unique cluster representation to describe each cluster,
resulting in a cluster-level memory dictionary. In this way, the consistency of
clustering can be effectively maintained throughout the pipline and the GPU
memory consumption can be significantly reduced. Thus, our method can solve the
problem of cluster inconsistency and be applicable to larger data sets. In
addition, we adopt different clustering algorithms to demonstrate the
robustness and generalization of our framework. The application of Cluster
Contrast to a standard unsupervised re-ID pipeline achieves considerable
improvements of 9.9%, 8.3%, 12.1% compared to state-of-the-art purely
unsupervised re-ID methods and 5.5%, 4.8%, 4.4% mAP compared to the
state-of-the-art unsupervised domain adaptation re-ID methods on the Market,
Duke, and MSMT17 datasets. Code is available at
https://github.com/alibaba/cluster-contrast