38 research outputs found
Unsupervised Object Discovery and Localization in the Wild: Part-based Matching with Bottom-up Region Proposals
This paper addresses unsupervised discovery and localization of dominant
objects from a noisy image collection with multiple object classes. The setting
of this problem is fully unsupervised, without even image-level annotations or
any assumption of a single dominant class. This is far more general than
typical colocalization, cosegmentation, or weakly-supervised localization
tasks. We tackle the discovery and localization problem using a part-based
region matching approach: We use off-the-shelf region proposals to form a set
of candidate bounding boxes for objects and object parts. These regions are
efficiently matched across images using a probabilistic Hough transform that
evaluates the confidence for each candidate correspondence considering both
appearance and spatial consistency. Dominant objects are discovered and
localized by comparing the scores of candidate regions and selecting those that
stand out over other regions containing them. Extensive experimental
evaluations on standard benchmarks demonstrate that the proposed approach
significantly outperforms the current state of the art in colocalization, and
achieves robust object discovery in challenging mixed-class datasets.Comment: CVPR 201
Semi-supervised Semantic Segmentation with Error Localization Network
This paper studies semi-supervised learning of semantic segmentation, which
assumes that only a small portion of training images are labeled and the others
remain unlabeled. The unlabeled images are usually assigned pseudo labels to be
used in training, which however often causes the risk of performance
degradation due to the confirmation bias towards errors on the pseudo labels.
We present a novel method that resolves this chronic issue of pseudo labeling.
At the heart of our method lies error localization network (ELN), an auxiliary
module that takes an image and its segmentation prediction as input and
identifies pixels whose pseudo labels are likely to be wrong. ELN enables
semi-supervised learning to be robust against inaccurate pseudo labels by
disregarding label noises during training and can be naturally integrated with
self-training and contrastive learning. Moreover, we introduce a new learning
strategy for ELN that simulates plausible and diverse segmentation errors
during training of ELN to enhance its generalization. Our method is evaluated
on PASCAL VOC 2012 and Cityscapes, where it outperforms all existing methods in
every evaluation setting
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
HIER: Metric Learning Beyond Class Labels via Hierarchical Regularization
Supervision for metric learning has long been given in the form of
equivalence between human-labeled classes. Although this type of supervision
has been a basis of metric learning for decades, we argue that it hinders
further advances of the field. In this regard, we propose a new regularization
method, dubbed HIER, to discover the latent semantic hierarchy of training
data, and to deploy the hierarchy to provide richer and more fine-grained
supervision than inter-class separability induced by common metric learning
losses. HIER achieved this goal with no annotation for the semantic hierarchy
but by learning hierarchical proxies in hyperbolic spaces. The hierarchical
proxies are learnable parameters, and each of them is trained to serve as an
ancestor of a group of data or other proxies to approximate the semantic
hierarchy among them. HIER deals with the proxies along with data in hyperbolic
space since geometric properties of the space are well-suited to represent
their hierarchical structure. The efficacy of HIER was evaluated on four
standard benchmarks, where it consistently improved performance of conventional
methods when integrated with them, and consequently achieved the best records,
surpassing even the existing hyperbolic metric learning technique, in almost
all settings
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Cross-modal retrieval across image and text modalities is a challenging task
due to its inherent ambiguity: An image often exhibits various situations, and
a caption can be coupled with diverse images. Set-based embedding has been
studied as a solution to this problem. It seeks to encode a sample into a set
of different embedding vectors that capture different semantics of the sample.
In this paper, we present a novel set-based embedding method, which is distinct
from previous work in two aspects. First, we present a new similarity function
called smooth-Chamfer similarity, which is designed to alleviate the side
effects of existing similarity functions for set-based embedding. Second, we
propose a novel set prediction module to produce a set of embedding vectors
that effectively captures diverse semantics of input by the slot attention
mechanism. Our method is evaluated on the COCO and Flickr30K datasets across
different visual backbones, where it outperforms existing methods including
ones that demand substantially larger computation at inference.Comment: Accepted to CVPR 2023 (Highlight
Cross-Domain Ensemble Distillation for Domain Generalization
Domain generalization is the task of learning models that generalize to
unseen target domains. We propose a simple yet effective method for domain
generalization, named cross-domain ensemble distillation (XDED), that learns
domain-invariant features while encouraging the model to converge to flat
minima, which recently turned out to be a sufficient condition for domain
generalization. To this end, our method generates an ensemble of the output
logits from training data with the same label but from different domains and
then penalizes each output for the mismatch with the ensemble. Also, we present
a de-stylization technique that standardizes features to encourage the model to
produce style-consistent predictions even in an arbitrary target domain. Our
method greatly improves generalization capability in public benchmarks for
cross-domain image classification, cross-dataset person re-ID, and
cross-dataset semantic segmentation. Moreover, we show that models learned by
our method are robust against adversarial attacks and image corruptions.Comment: Accepted to ECCV 2022. Code is available at
http://github.com/leekyungmoon/XDE
Universal Metric Learning with Parameter-Efficient Transfer Learning
A common practice in metric learning is to train and test an embedding model
for each dataset. This dataset-specific approach fails to simulate real-world
scenarios that involve multiple heterogeneous distributions of data. In this
regard, we introduce a novel metric learning paradigm, called Universal Metric
Learning (UML), which learns a unified distance metric capable of capturing
relations across multiple data distributions. UML presents new challenges, such
as imbalanced data distribution and bias towards dominant distributions. To
address these challenges, we propose Parameter-efficient Universal Metric
leArning (PUMA), which consists of a pre-trained frozen model and two
additional modules, stochastic adapter and prompt pool. These modules enable to
capture dataset-specific knowledge while avoiding bias towards dominant
distributions. Additionally, we compile a new universal metric learning
benchmark with a total of 8 different datasets. PUMA outperformed the
state-of-the-art dataset-specific models while using about 69 times fewer
trainable parameters