41 research outputs found
Pairwise Quantization
We consider the task of lossy compression of high-dimensional vectors through
quantization. We propose the approach that learns quantization parameters by
minimizing the distortion of scalar products and squared distances between
pairs of points. This is in contrast to previous works that obtain these
parameters through the minimization of the reconstruction error of individual
points. The proposed approach proceeds by finding a linear transformation of
the data that effectively reduces the minimization of the pairwise distortions
to the minimization of individual reconstruction errors. After such
transformation, any of the previously-proposed quantization approaches can be
used. Despite the simplicity of this transformation, the experiments
demonstrate that it achieves considerable reduction of the pairwise distortions
compared to applying quantization directly to the untransformed data
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Knowledge distillation methods have recently shown to be a promising
direction to speedup the synthesis of large-scale diffusion models by requiring
only a few inference steps. While several powerful distillation methods were
recently proposed, the overall quality of student samples is typically lower
compared to the teacher ones, which hinders their practical usage. In this
work, we investigate the relative quality of samples produced by the teacher
text-to-image diffusion model and its distilled student version. As our main
empirical finding, we discover that a noticeable portion of student samples
exhibit superior fidelity compared to the teacher ones, despite the
"approximate" nature of the student. Based on this finding, we propose an
adaptive collaboration between student and teacher diffusion models for
effective text-to-image synthesis. Specifically, the distilled model produces
the initial sample, and then an oracle decides whether it needs further
improvements with a slow teacher model. Extensive experiments demonstrate that
the designed pipeline surpasses state-of-the-art text-to-image alternatives for
various inference budgets in terms of human preference. Furthermore, the
proposed approach can be naturally used in popular applications such as
text-guided image editing and controllable generation.Comment: CVPR2024 camera ready v