14 research outputs found
A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization
Normalized-Cut (N-Cut) is a famous model of spectral clustering. The
traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral
embedding of normalized Laplacian matrix; 2) discretization via -means or
spectral rotation. However, this paradigm brings two vital problems: 1)
two-stage methods solve a relaxed version of the original problem, so they
cannot obtain good solutions for the original N-Cut problem; 2) solving the
relaxed problem requires eigenvalue decomposition, which has
time complexity ( is the number of nodes). To address the problems, we
propose a novel N-Cut solver designed based on the famous coordinate descent
method. Since the vanilla coordinate descent method also has
time complexity, we design various accelerating strategies to reduce the time
complexity to ( is the number of edges). To avoid
reliance on random initialization which brings uncertainties to clustering, we
propose an efficient initialization method that gives deterministic outputs.
Extensive experiments on several benchmark datasets demonstrate that the
proposed solver can obtain larger objective values of N-Cut, meanwhile
achieving better clustering performance compared to traditional solvers
Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search
Weakly supervised person search aims to perform joint pedestrian detection
and re-identification (re-id) with only person bounding-box annotations.
Recently, the idea of contrastive learning is initially applied to weakly
supervised person search, where two common contrast strategies are memory-based
contrast and intra-image contrast. We argue that current intra-image contrast
is shallow, which suffers from spatial-level and occlusion-level variance. In
this paper, we present a novel deep intra-image contrastive learning using a
Siamese network. Two key modules are spatial-invariant contrast (SIC) and
occlusion-invariant contrast (OIC). SIC performs many-to-one contrasts between
two branches of Siamese network and dense prediction contrasts in one branch of
Siamese network. With these many-to-one and dense contrasts, SIC tends to learn
discriminative scale-invariant and location-invariant features to solve
spatial-level variance. OIC enhances feature consistency with the masking
strategy to learn occlusion-invariant features. Extensive experiments are
performed on two person search datasets CUHK-SYSU and PRW, respectively. Our
method achieves a state-of-the-art performance among weakly supervised one-step
person search approaches. We hope that our simple intra-image contrastive
learning can provide more paradigms on weakly supervised person search. The
source code is available at \url{https://github.com/jiabeiwangTJU/DICL}.Comment: 10 pages, 6 figure
Variational Clustering: Leveraging Variational Autoencoders for Image Clustering
Recent advances in deep learning have shown their ability to learn strong
feature representations for images. The task of image clustering naturally
requires good feature representations to capture the distribution of the data
and subsequently differentiate data points from one another. Often these two
aspects are dealt with independently and thus traditional feature learning
alone does not suffice in partitioning the data meaningfully. Variational
Autoencoders (VAEs) naturally lend themselves to learning data distributions in
a latent space. Since we wish to efficiently discriminate between different
clusters in the data, we propose a method based on VAEs where we use a Gaussian
Mixture prior to help cluster the images accurately. We jointly learn the
parameters of both the prior and the posterior distributions. Our method
represents a true Gaussian Mixture VAE. This way, our method simultaneously
learns a prior that captures the latent distribution of the images and a
posterior to help discriminate well between data points. We also propose a
novel reparametrization of the latent space consisting of a mixture of discrete
and continuous variables. One key takeaway is that our method generalizes
better across different datasets without using any pre-training or learnt
models, unlike existing methods, allowing it to be trained from scratch in an
end-to-end manner. We verify our efficacy and generalizability experimentally
by achieving state-of-the-art results among unsupervised methods on a variety
of datasets. To the best of our knowledge, we are the first to pursue image
clustering using VAEs in a purely unsupervised manner on real image datasets
Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Video temporal character grouping locates appearing moments of major
characters within a video according to their identities. To this end, recent
works have evolved from unsupervised clustering to graph-based supervised
clustering. However, graph methods are built upon the premise of fixed affinity
graphs, bringing many inexact connections. Besides, they extract multi-modal
features with kinds of models, which are unfriendly to deployment. In this
paper, we present a unified and dynamic graph (UniDG) framework for temporal
character grouping. This is accomplished firstly by a unified representation
network that learns representations of multiple modalities within the same
space and still preserves the modality's uniqueness simultaneously. Secondly,
we present a dynamic graph clustering where the neighbors of different
quantities are dynamically constructed for each node via a cyclic matching
strategy, leading to a more reliable affinity graph. Thirdly, a progressive
association method is introduced to exploit spatial and temporal contexts among
different modalities, allowing multi-modal clustering results to be well fused.
As current datasets only provide pre-extracted features, we evaluate our UniDG
method on a collected dataset named MTCG, which contains each character's
appearing clips of face and body and speaking voice tracks. We also evaluate
our key components on existing clustering and retrieval datasets to verify the
generalization ability. Experimental results manifest that our method can
achieve promising results and outperform several state-of-the-art approaches