18 research outputs found
Multi-scale Diffusion Denoised Smoothing
Along with recent diffusion models, randomized smoothing has become one of a
few tangible approaches that offers adversarial robustness to models at scale,
e.g., those of large pre-trained models. Specifically, one can perform
randomized smoothing on any classifier via a simple "denoise-and-classify"
pipeline, so-called denoised smoothing, given that an accurate denoiser is
available - such as diffusion model. In this paper, we present scalable methods
to address the current trade-off between certified robustness and accuracy in
denoised smoothing. Our key idea is to "selectively" apply smoothing among
multiple noise scales, coined multi-scale smoothing, which can be efficiently
implemented with a single diffusion model. This approach also suggests a new
objective to compare the collective robustness of multi-scale smoothed
classifiers, and questions which representation of diffusion model would
maximize the objective. To address this, we propose to further fine-tune
diffusion model (a) to perform consistent denoising whenever the original image
is recoverable, but (b) to generate rather diverse outputs otherwise. Our
experiments show that the proposed multi-scale smoothing scheme combined with
diffusion fine-tuning enables strong certified robustness available with high
noise level while maintaining its accuracy close to non-smoothed classifiers.Comment: Published as a conference paper at NeurIPS 2023; Code is available at
https://github.com/jh-jeong/smoothing-multiscal
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Despite its practical importance across a wide range of modalities, recent
advances in self-supervised learning (SSL) have been primarily focused on a few
well-curated domains, e.g., vision and language, often relying on their
domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become
one of the popular architectures in these domains, but less has explored its
potential in other modalities. In this paper, we develop MAE as a unified,
modality-agnostic SSL framework. In turn, we argue meta-learning as a key to
interpreting MAE as a modality-agnostic learner, and propose enhancements to
MAE from the motivation to jointly improve its SSL across diverse modalities,
coined MetaMAE as a result. Our key idea is to view the mask reconstruction of
MAE as a meta-learning task: masked tokens are predicted by adapting the
Transformer meta-learner through the amortization of unmasked tokens. Based on
this novel interpretation, we propose to integrate two advanced meta-learning
techniques. First, we adapt the amortized latent of the Transformer encoder
using gradient-based meta-learning to enhance the reconstruction. Then, we
maximize the alignment between amortized and adapted latents through task
contrastive learning which guides the Transformer encoder to better encode the
task-specific knowledge. Our experiment demonstrates the superiority of MetaMAE
in the modality-agnostic SSL benchmark (called DABS), significantly
outperforming prior baselines. Code is available at
https://github.com/alinlab/MetaMAE.Comment: Accepted to NeurIPS 2023. The first two authors contributed equall
M2m: Imbalanced Classification via Major-to-minor Translation
In most real-world scenarios, labeled training datasets are highly
class-imbalanced, where deep neural networks suffer from generalizing to a
balanced testing criterion. In this paper, we explore a novel yet simple way to
alleviate this issue by augmenting less-frequent classes via translating
samples (e.g., images) from more-frequent classes. This simple approach enables
a classifier to learn more generalizable features of minority classes, by
transferring and leveraging the diversity of the majority information. Our
experimental results on a variety of class-imbalanced datasets show that the
proposed method improves the generalization on minority classes significantly
compared to other existing re-sampling or re-weighting methods. The performance
of our method even surpasses those of previous state-of-the-art methods for the
imbalanced classification.Comment: 12 pages; CVPR 202
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
Visual anomaly classification and segmentation are vital for automating
industrial quality inspection. The focus of prior research in the field has
been on training custom models for each quality inspection task, which requires
task-specific images and annotation. In this paper we move away from this
regime, addressing zero-shot and few-normal-shot anomaly classification and
segmentation. Recently CLIP, a vision-language model, has shown revolutionary
generality with competitive zero-/few-shot performance in comparison to
full-supervision. But CLIP falls short on anomaly classification and
segmentation tasks. Hence, we propose window-based CLIP (WinCLIP) with (1) a
compositional ensemble on state words and prompt templates and (2) efficient
extraction and aggregation of window/patch/image-level features aligned with
text. We also propose its few-normal-shot extension WinCLIP+, which uses
complementary information from normal images. In MVTec-AD (and VisA), without
further tuning, WinCLIP achieves 91.8%/85.1% (78.1%/79.6%) AUROC in zero-shot
anomaly classification and segmentation while WinCLIP+ does 93.1%/95.2%
(83.8%/96.4%) in 1-normal-shot, surpassing state-of-the-art by large margins.Comment: Accepted to Conference on Computer Vision and Pattern Recognition
(CVPR) 202
Collaborative Score Distillation for Consistent Visual Synthesis
Generative priors of large-scale text-to-image diffusion models enable a wide
range of new generation and editing applications on diverse visual modalities.
However, when adapting these priors to complex visual modalities, often
represented as multiple images (e.g., video), achieving consistency across a
set of images is challenging. In this paper, we address this challenge with a
novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein
Variational Gradient Descent (SVGD). Specifically, we propose to consider
multiple samples as "particles" in the SVGD update and combine their score
functions to distill generative priors over a set of images synchronously.
Thus, CSD facilitates seamless integration of information across 2D images,
leading to a consistent visual synthesis across multiple samples. We show the
effectiveness of CSD in a variety of tasks, encompassing the visual editing of
panorama images, videos, and 3D scenes. Our results underline the competency of
CSD as a versatile method for enhancing inter-sample consistency, thereby
broadening the applicability of text-to-image diffusion models.Comment: Project page with visuals: https://subin-kim-cv.github.io/CSD
Confidence-Aware Training of Smoothed Classifiers for Certified Robustness
Any classifier can be "smoothed out" under Gaussian noise to build a new classifier that is provably robust to l2-adversarial perturbations, viz., by averaging its predictions over the noise via randomized smoothing. Under the smoothed classifiers, the fundamental trade-off between accuracy and (adversarial) robustness has been well evidenced in the literature: i.e., increasing the robustness of a classifier for an input can be at the expense of decreased accuracy for some other inputs. In this paper, we propose a simple training method leveraging this trade-off to obtain robust smoothed classifiers, in particular, through a sample-wise control of robustness over the training samples. We make this control feasible by using "accuracy under Gaussian noise" as an easy-to-compute proxy of adversarial robustness for an input. Specifically, we differentiate the training objective depending on this proxy to filter out samples that are unlikely to benefit from the worst-case (adversarial) objective. Our experiments show that the proposed method, despite its simplicity, consistently exhibits improved certified robustness upon state-of-the-art training methods. Somewhat surprisingly, we find these improvements persist even for other notions of robustness, e.g., to various types of common corruptions. Code is available at https://github.com/alinlab/smoothing-catrs
Consistency Regularization for Adversarial Robustness
Adversarial training (AT) is currently one of the most successful methods to
obtain the adversarial robustness of deep neural networks. However, the
phenomenon of robust overfitting, i.e., the robustness starts to decrease
significantly during AT, has been problematic, not only making practitioners
consider a bag of tricks for a successful training, e.g., early stopping, but
also incurring a significant generalization gap in the robustness. In this
paper, we propose an effective regularization technique that prevents robust
overfitting by optimizing an auxiliary `consistency' regularization loss during
AT. Specifically, we discover that data augmentation is a quite effective tool
to mitigate the overfitting in AT, and develop a regularization that forces the
predictive distributions after attacking from two different augmentations of
the same instance to be similar with each other. Our experimental results
demonstrate that such a simple regularization technique brings significant
improvements in the test robust accuracy of a wide range of AT methods. More
remarkably, we also show that our method could significantly help the model to
generalize its robustness against unseen adversaries, e.g., other types or
larger perturbations compared to those used during training. Code is available
at https://github.com/alinlab/consistency-adversarial.Comment: Published as a conference proceeding for AAAI 202