72 research outputs found
Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification
This paper aims to learn a domain-generalizable (DG) person re-identification
(ReID) representation from large-scale videos \textbf{without any annotation}.
Prior DG ReID methods employ limited labeled data for training due to the high
cost of annotation, which restricts further advances. To overcome the barriers
of data and annotation, we propose to utilize large-scale unsupervised data for
training. The key issue lies in how to mine identity information. To this end,
we propose an Identity-seeking Self-supervised Representation learning (ISR)
method. ISR constructs positive pairs from inter-frame images by modeling the
instance association as a maximum-weight bipartite matching problem. A
reliability-guided contrastive loss is further presented to suppress the
adverse impact of noisy positive pairs, ensuring that reliable positive pairs
dominate the learning process. The training cost of ISR scales approximately
linearly with the data size, making it feasible to utilize large-scale data for
training. The learned representation exhibits superior generalization ability.
\textbf{Without human annotation and fine-tuning, ISR achieves 87.0\% Rank-1 on
Market-1501 and 56.4\% Rank-1 on MSMT17}, outperforming the best supervised
domain-generalizable method by 5.0\% and 19.5\%, respectively. In the
pre-trainingfine-tuning scenario, ISR achieves state-of-the-art
performance, with 88.4\% Rank-1 on MSMT17. The code is at
\url{https://github.com/dcp15/ISR_ICCV2023_Oral}.Comment: ICCV 2023 Ora
Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning
The softmax loss and its variants are widely used as objectives for embedding
learning, especially in applications like face recognition. However, the intra-
and inter-class objectives in the softmax loss are entangled, therefore a
well-optimized inter-class objective leads to relaxation on the intra-class
objective, and vice versa. In this paper, we propose to dissect the softmax
loss into independent intra- and inter-class objective (D-Softmax). With
D-Softmax as objective, we can have a clear understanding of both the intra-
and inter-class objective, therefore it is straightforward to tune each part to
the best state. Furthermore, we find the computation of the inter-class
objective is redundant and propose two sampling-based variants of D-Softmax to
reduce the computation cost. Training with regular-scale data, experiments in
face verification show D-Softmax is favorably comparable to existing losses
such as SphereFace and ArcFace. Training with massive-scale data, experiments
show the fast variants of D-Softmax significantly accelerates the training
process (such as 64x) with only a minor sacrifice in performance, outperforming
existing acceleration methods of softmax in terms of both performance and
efficiency.Comment: Accepted to AAAI-2020, Oral presentatio
How to Synthesize a Large-Scale and Trainable Micro-Expression Dataset?
This paper does not contain technical novelty but introduces our key
discoveries in a data generation protocol, a database and insights. We aim to
address the lack of large-scale datasets in micro-expression (MiE) recognition
due to the prohibitive cost of data collection, which renders large-scale
training less feasible. To this end, we develop a protocol to automatically
synthesize large scale MiE training data that allow us to train improved
recognition models for real-world test data. Specifically, we discover three
types of Action Units (AUs) that can constitute trainable MiEs. These AUs come
from real-world MiEs, early frames of macro-expression videos, and the
relationship between AUs and expression categories defined by human expert
knowledge. With these AUs, our protocol then employs large numbers of face
images of various identities and an off-the-shelf face generator for MiE
synthesis, yielding the MiE-X dataset. MiE recognition models are trained or
pre-trained on MiE-X and evaluated on real-world test sets, where very
competitive accuracy is obtained. Experimental results not only validate the
effectiveness of the discovered AUs and MiE-X dataset but also reveal some
interesting properties of MiEs: they generalize across faces, are close to
early-stage macro-expressions, and can be manually defined.Comment: European Conference on Computer Vision 202
Generalizable Re-Identification from Videos with Cycle Association
In this paper, we are interested in learning a generalizable person
re-identification (re-ID) representation from unlabeled videos. Compared with
1) the popular unsupervised re-ID setting where the training and test sets are
typically under the same domain, and 2) the popular domain generalization (DG)
re-ID setting where the training samples are labeled, our novel scenario
combines their key challenges: the training samples are unlabeled, and
collected form various domains which do no align with the test domain. In other
words, we aim to learn a representation in an unsupervised manner and directly
use the learned representation for re-ID in novel domains. To fulfill this
goal, we make two main contributions: First, we propose Cycle Association
(CycAs), a scalable self-supervised learning method for re-ID with low training
complexity; and second, we construct a large-scale unlabeled re-ID dataset
named LMP-video, tailored for the proposed method. Specifically, CycAs learns
re-ID features by enforcing cycle consistency of instance association between
temporally successive video frame pairs, and the training cost is merely linear
to the data size, making large-scale training possible. On the other hand, the
LMP-video dataset is extremely large, containing 50 million unlabeled person
images cropped from over 10K Youtube videos, therefore is sufficient to serve
as fertile soil for self-supervised learning. Trained on LMP-video, we show
that CycAs learns good generalization towards novel domains. The achieved
results sometimes even outperform supervised domain generalizable models.
Remarkably, CycAs achieves 82.2% Rank-1 on Market-1501 and 49.0% Rank-1 on
MSMT17 with zero human annotation, surpassing state-of-the-art supervised DG
re-ID methods. Moreover, we also demonstrate the superiority of CycAs under the
canonical unsupervised re-ID and the pretrain-and-finetune scenarios
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
Perception systems in modern autonomous driving vehicles typically take
inputs from complementary multi-modal sensors, e.g., LiDAR and cameras.
However, in real-world applications, sensor corruptions and failures lead to
inferior performances, thus compromising autonomous safety. In this paper, we
propose a robust framework, called MetaBEV, to address extreme real-world
environments involving overall six sensor corruptions and two extreme
sensor-missing situations. In MetaBEV, signals from multiple sensors are first
processed by modal-specific encoders. Subsequently, a set of dense BEV queries
are initialized, termed meta-BEV. These queries are then processed iteratively
by a BEV-Evolving decoder, which selectively aggregates deep features from
either LiDAR, cameras, or both modalities. The updated BEV representations are
further leveraged for multiple 3D prediction tasks. Additionally, we introduce
a new M2oE structure to alleviate the performance drop on distinct tasks in
multi-task joint learning. Finally, MetaBEV is evaluated on the nuScenes
dataset with 3D object detection and BEV map segmentation tasks. Experiments
show MetaBEV outperforms prior arts by a large margin on both full and
corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV
improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla
BEVFusion model; and when the camera signal is absent, MetaBEV still achieves
69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform
on full-modalities. Moreover, MetaBEV performs fairly against previous methods
in both canonical perception and multi-task learning settings, refreshing
state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.Comment: Project page: https://chongjiange.github.io/metabev.htm
Extracellular Vesicle-Mediated Communication Within Host-Parasite Interactions
Extracellular vesicles (EVs) are small membrane-surrounded structures released by different kinds of cells (normal, diseased, and transformed cells) in vivo and in vitro that contain large amounts of important substances (such as lipids, proteins, metabolites, DNA, RNA, and non-coding RNA (ncRNA), including miRNA, lncRNA, tRNA, rRNA, snoRNA, and scaRNA) in an evolutionarily conserved manner. EVs, including exosomes, play a role in the transmission of information, and substances between cells that is increasingly being recognized as important. In some infectious diseases such as parasitic diseases, EVs have emerged as a ubiquitous mechanism for mediating communication during host-parasite interactions. EVs can enable multiple modes to transfer virulence factors and effector molecules from parasites to hosts, thereby regulating host gene expression, and immune responses and, consequently, mediating the pathogenic process, which has made us rethink our understanding of the host-parasite interface. Thus, here, we review the present findings regarding EVs (especially exosomes) and recognize the role of EVs in host-parasite interactions. We hope that a better understanding of the mechanisms of parasite-derived EVs may provide new insights for further diagnostic biomarker, vaccine, and therapeutic development
A Sir2-Like Protein Participates in Mycobacterial NHEJ
In eukaryotic cells, repair of DNA double-strand breaks (DSBs) by the nonhomologous end-joining (NHEJ) pathway is critical for genome stability. In contrast to the complex eukaryotic repair system, bacterial NHEJ apparatus consists of only two proteins, Ku and a multifunctional DNA ligase (LigD), whose functional mechanism has not been fully clarified. We show here for the first time that Sir2 is involved in the mycobacterial NHEJ repair pathway. Here, using tandem affinity purification (TAP) screening, we have identified an NAD-dependent deacetylase in mycobacteria which is a homologue of the eukaryotic Sir2 protein and interacts directly with Ku. Results from an in vitro glutathione S-transferase (GST) pull-down assay suggest that Sir2 interacts directly with LigD. Plasmid-based end-joining assays revealed that the efficiency of DSB repair in a sir2 deletion mutant was reduced 2-fold. Moreover, the Ξsir2 strain was about 10-fold more sensitive to ionizing radiation (IR) in the stationary phase than the wild-type. Our results suggest that Sir2 may function closely together with Ku and LigD in the nonhomologous end-joining pathway in mycobacteria
- β¦