106 research outputs found
Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Recent work on scene classification still makes use of generic CNN features
in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline
built upon deep CNN features to harvest discriminative visual objects and parts
for scene classification. We first use a region proposal technique to generate
a set of high-quality patches potentially containing objects, and apply a
pre-trained CNN to extract generic deep features from these patches. Then we
perform both unsupervised and weakly supervised learning to screen these
patches and discover discriminative ones representing category-specific objects
and parts. We further apply discriminative clustering enhanced with local CNN
fine-tuning to aggregate similar objects and parts into groups, called meta
objects. A scene image representation is constructed by pooling the feature
response maps of all the learned meta objects at multiple spatial scales. We
have confirmed that the scene image representation obtained using this new
pipeline is capable of delivering state-of-the-art performance on two popular
scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and
Sun397~\cite{Sun397}Comment: To Appear in ICCV 201
Constrained K-means with General Pairwise and Cardinality Constraints
In this work, we study constrained clustering, where constraints are utilized
to guide the clustering process. In existing works, two categories of
constraints have been widely explored, namely pairwise and cardinality
constraints. Pairwise constraints enforce the cluster labels of two instances
to be the same (must-link constraints) or different (cannot-link constraints).
Cardinality constraints encourage cluster sizes to satisfy a user-specified
distribution. However, most existing constrained clustering models can only
utilize one category of constraints at a time. In this paper, we enforce the
above two categories into a unified clustering model starting with the integer
program formulation of the standard K-means. As these two categories provide
useful information at different levels, utilizing both of them is expected to
allow for better clustering performance. However, the optimization is difficult
due to the binary and quadratic constraints in the proposed unified
formulation. To alleviate this difficulty, we utilize two techniques:
equivalently replacing the binary constraints by the intersection of two
continuous constraints; the other is transforming the quadratic constraints
into bi-linear constraints by introducing extra variables. Then we derive an
equivalent continuous reformulation with simple constraints, which can be
efficiently solved by Alternating Direction Method of Multipliers (ADMM)
algorithm. Extensive experiments on both synthetic and real data demonstrate:
(1) when utilizing a single category of constraint, the proposed model is
superior to or competitive with state-of-the-art constrained clustering models,
and (2) when utilizing both categories of constraints jointly, the proposed
model shows better performance than the case of the single category
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples
Backdoor attacks are serious security threats to machine learning models
where an adversary can inject poisoned samples into the training set, causing a
backdoored model which predicts poisoned samples with particular triggers to
particular target classes, while behaving normally on benign samples. In this
paper, we explore the task of purifying a backdoored model using a small clean
dataset. By establishing the connection between backdoor risk and adversarial
risk, we derive a novel upper bound for backdoor risk, which mainly captures
the risk on the shared adversarial examples (SAEs) between the backdoored model
and the purified model. This upper bound further suggests a novel bi-level
optimization problem for mitigating backdoor using adversarial training
techniques. To solve it, we propose Shared Adversarial Unlearning (SAU).
Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs
such that they are either correctly classified by the purified model and/or
differently classified by the two models, such that the backdoor effect in the
backdoored model will be mitigated in the purified model. Experiments on
various benchmark datasets and network architectures show that our proposed
method achieves state-of-the-art performance for backdoor defense
UCF: Uncovering Common Features for Generalizable Deepfake Detection
Deepfake detection remains a challenging task due to the difficulty of
generalizing to new types of forgeries. This problem primarily stems from the
overfitting of existing detection methods to forgery-irrelevant features and
method-specific patterns. The latter has been rarely studied and not well
addressed by previous works. This paper presents a novel approach to address
the two types of overfitting issues by uncovering common forgery features.
Specifically, we first propose a disentanglement framework that decomposes
image information into three distinct components: forgery-irrelevant,
method-specific forgery, and common forgery features. To ensure the decoupling
of method-specific and common forgery features, a multi-task learning strategy
is employed, including a multi-class classification that predicts the category
of the forgery method and a binary classification that distinguishes the real
from the fake. Additionally, a conditional decoder is designed to utilize
forgery features as a condition along with forgery-irrelevant features to
generate reconstructed images. Furthermore, a contrastive regularization
technique is proposed to encourage the disentanglement of the common and
specific forgery features. Ultimately, we only utilize the common forgery
features for the purpose of generalizable deepfake detection. Extensive
evaluations demonstrate that our framework can perform superior generalization
than current state-of-the-art methods
VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency
The role of data in building AI systems has recently been emphasized by the
emerging concept of data-centric AI. Unfortunately, in the real-world, datasets
may contain dirty samples, such as poisoned samples from backdoor attack, noisy
labels in crowdsourcing, and even hybrids of them. The presence of such dirty
samples makes the DNNs vunerable and unreliable.Hence, it is critical to detect
dirty samples to improve the quality and realiability of dataset. Existing
detectors only focus on detecting poisoned samples or noisy labels, that are
often prone to weak generalization when dealing with dirty samples from other
domains.In this paper, we find a commonality of various dirty samples is
visual-linguistic inconsistency between images and associated labels. To
capture the semantic inconsistency between modalities, we propose versatile
data cleanser (VDC) leveraging the surpassing capabilities of multimodal large
language models (MLLM) in cross-modal alignment and reasoning.It consists of
three consecutive modules: the visual question generation module to generate
insightful questions about the image; the visual question answering module to
acquire the semantics of the visual content by answering the questions with
MLLM; followed by the visual answer evaluation module to evaluate the
inconsistency.Extensive experiments demonstrate its superior performance and
generalization to various categories and types of dirty samples.Comment: 22 pages,5 figures,17 table
- …