282 research outputs found
Fast Sampling of Diffusion Models via Operator Learning
Diffusion models have found widespread adoption in various areas. However,
sampling from them is slow because it involves emulating a reverse process with
hundreds-to-thousands of network evaluations. Inspired by the success of neural
operators in accelerating differential equations solving, we approach this
problem by solving the underlying neural differential equation from an operator
learning perspective. We examine probability flow ODE trajectories in diffusion
models and observe a compact energy spectrum that can be learned efficiently in
Fourier space. With this insight, we propose diffusion Fourier neural operator
(DFNO) with temporal convolution in Fourier space to parameterize the operator
that maps initial condition to the solution trajectory, which is a continuous
function in time. DFNO can be applied to any diffusion model and generate
high-quality samples in one model forward call. Our method achieves the
state-of-the-art FID of 4.72 on CIFAR-10 using only one model evaluation
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Pre-trained vision-language models, e.g., CLIP, working with manually
designed prompts have demonstrated great capacity of transfer learning.
Recently, learnable prompts achieve state-of-the-art performance, which however
are prone to overfit to seen classes, failing to generalize to unseen classes.
In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for
vision-language models. Our approach takes inspiration from human intelligence
in which external knowledge is usually incorporated into recognizing novel
categories of objects. Specifically, we design two complementary types of
knowledge-aware prompts for the text encoder to leverage the distinctive
characteristics of category-related external knowledge. The discrete prompt
extracts the key information from descriptions of an object category, and the
learned continuous prompt captures overall contexts. We further design an
adaptation head for the visual encoder to aggregate salient attentive visual
cues, which establishes discriminative and task-aware visual representations.
We conduct extensive experiments on 11 widely-used benchmark datasets and the
results verify the effectiveness in few-shot image classification, especially
in generalizing to unseen categories. Compared with the state-of-the-art CoCoOp
method, KAPT exhibits favorable performance and achieves an absolute gain of
3.22% on new classes and 2.57% in terms of harmonic mean.Comment: Accepted by ICCV 202
- …