740 research outputs found
Adversarial Attacks on Video Object Segmentation with Hard Region Discovery
Video object segmentation has been applied to various computer vision tasks,
such as video editing, autonomous driving, and human-robot interaction.
However, the methods based on deep neural networks are vulnerable to
adversarial examples, which are the inputs attacked by almost
human-imperceptible perturbations, and the adversary (i.e., attacker) will fool
the segmentation model to make incorrect pixel-level predictions. This will
rise the security issues in highly-demanding tasks because small perturbations
to the input video will result in potential attack risks. Though adversarial
examples have been extensively used for classification, it is rarely studied in
video object segmentation. Existing related methods in computer vision either
require prior knowledge of categories or cannot be directly applied due to the
special design for certain tasks, failing to consider the pixel-wise region
attack. Hence, this work develops an object-agnostic adversary that has
adversarial impacts on VOS by first-frame attacking via hard region discovery.
Particularly, the gradients from the segmentation model are exploited to
discover the easily confused region, in which it is difficult to identify the
pixel-wise objects from the background in a frame. This provides a hardness map
that helps to generate perturbations with a stronger adversarial power for
attacking the first frame. Empirical studies on three benchmarks indicate that
our attacker significantly degrades the performance of several state-of-the-art
video object segmentation models
Which Model to Transfer? A Survey on Transferability Estimation
Transfer learning methods endeavor to leverage relevant knowledge from
existing source pre-trained models or datasets to solve downstream target
tasks. With the increase in the scale and quantity of available pre-trained
models nowadays, it becomes critical to assess in advance whether they are
suitable for a specific target task. Model transferability estimation is an
emerging and growing area of interest, aiming to propose a metric to quantify
this suitability without training them individually, which is computationally
prohibitive. Despite extensive recent advances already devoted to this area,
they have custom terminological definitions and experimental settings. In this
survey, we present the first review of existing advances in this area and
categorize them into two separate realms: source-free model transferability
estimation and source-dependent model transferability estimation. Each category
is systematically defined, accompanied by a comprehensive taxonomy. Besides, we
address challenges and outline future research directions, intending to provide
a comprehensive guide to aid researchers and practitioners
Exploring Model Transferability through the Lens of Potential Energy
Transfer learning has become crucial in computer vision tasks due to the vast
availability of pre-trained deep learning models. However, selecting the
optimal pre-trained model from a diverse pool for a specific downstream task
remains a challenge. Existing methods for measuring the transferability of
pre-trained models rely on statistical correlations between encoded static
features and task labels, but they overlook the impact of underlying
representation dynamics during fine-tuning, leading to unreliable results,
especially for self-supervised models. In this paper, we present an insightful
physics-inspired approach named PED to address these challenges. We reframe the
challenge of model selection through the lens of potential energy and directly
model the interaction forces that influence fine-tuning dynamics. By capturing
the motion of dynamic representations to decline the potential energy within a
force-driven physical model, we can acquire an enhanced and more stable
observation for estimating transferability. The experimental results on 10
downstream tasks and 12 self-supervised models demonstrate that our approach
can seamlessly integrate into existing ranking techniques and enhance their
performances, revealing its effectiveness for the model selection task and its
potential for understanding the mechanism in transfer learning. Code will be
available at https://github.com/lixiaotong97/PED.Comment: Accepted by ICCV 202
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach
Estimating the transferability of publicly available pretrained models to a
target task has assumed an important place for transfer learning tasks in
recent years. Existing efforts propose metrics that allow a user to choose one
model from a pool of pre-trained models without having to fine-tune each model
individually and identify one explicitly. With the growth in the number of
available pre-trained models and the popularity of model ensembles, it also
becomes essential to study the transferability of multiple-source models for a
given target task. The few existing efforts study transferability in such
multi-source ensemble settings using just the outputs of the classification
layer and neglect possible domain or task mismatch. Moreover, they overlook the
most important factor while selecting the source models, viz., the cohesiveness
factor between them, which can impact the performance and confidence in the
prediction of the ensemble. To address these gaps, we propose a novel Optimal
tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the
transferability of an ensemble of models to a downstream task. OSBORN
collectively accounts for image domain difference, task difference, and
cohesiveness of models in the ensemble to provide reliable estimates of
transferability. We gauge the performance of OSBORN on both image
classification and semantic segmentation tasks. Our setup includes 28 source
datasets, 11 target datasets, 5 model architectures, and 2 pre-training
methods. We benchmark our method against current state-of-the-art metrics
MS-LEEP and E-LEEP, and outperform them consistently using the proposed
approach.Comment: To appear at ICCV 202
Ranking Neural Checkpoints
This paper is concerned with ranking many pre-trained deep neural networks
(DNNs), called checkpoints, for the transfer learning to a downstream task.
Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints
from various sources. Which of them transfers the best to our downstream task
of interest? Striving to answer this question thoroughly, we establish a neural
checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking
measures. These measures are generic, applying to the checkpoints of different
output types without knowing how the checkpoints are pre-trained on which
dataset. They also incur low computation cost, making them practically
meaningful. Our results suggest that the linear separability of the features
extracted by the checkpoints is a strong indicator of transferability. We also
arrive at a new ranking measure, NLEEP, which gives rise to the best
performance in the experiments.Comment: Accepted to CVPR 202
Transferability-Guided Cross-Domain Cross-Task Transfer Learning
We propose two novel transferability metrics F-OTCE (Fast Optimal Transport
based Conditional Entropy) and JC-OTCE (Joint Correspondence OTCE) to evaluate
how much the source model (task) can benefit the learning of the target task
and to learn more transferable representations for cross-domain cross-task
transfer learning. Unlike the existing metric that requires evaluating the
empirical transferability on auxiliary tasks, our metrics are auxiliary-free
such that they can be computed much more efficiently. Specifically, F-OTCE
estimates transferability by first solving an Optimal Transport (OT) problem
between source and target distributions, and then uses the optimal coupling to
compute the Negative Conditional Entropy between source and target labels. It
can also serve as a loss function to maximize the transferability of the source
model before finetuning on the target task. Meanwhile, JC-OTCE improves the
transferability robustness of F-OTCE by including label distances in the OT
problem, though it may incur additional computation cost. Extensive experiments
demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free
metrics by 18.85% and 28.88%, respectively in correlation coefficient with the
ground-truth transfer accuracy. By eliminating the training cost of auxiliary
tasks, the two metrics reduces the total computation time of the previous
method from 43 minutes to 9.32s and 10.78s, respectively, for a pair of tasks.
When used as a loss function, F-OTCE shows consistent improvements on the
transfer accuracy of the source model in few-shot classification experiments,
with up to 4.41% accuracy gain.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
- …