140 research outputs found
Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect
Despite being impactful on a variety of problems and applications, the
generative adversarial nets (GANs) are remarkably difficult to train. This
issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an
alternative direction to avoid the caveats in the minmax two-player training of
GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the
1-Lipschitz continuity of the discriminator. In this paper, we propose a novel
approach to enforcing the Lipschitz continuity in the training procedure of
WGANs. Our approach seamlessly connects WGAN with one of the recent
semi-supervised learning methods. As a result, it gives rise to not only better
photo-realistic samples than the previous methods but also state-of-the-art
semi-supervised learning results. In particular, our approach gives rise to the
inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the
first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000
labeled images, to the best of our knowledge.Comment: Accepted as a conference paper in International Conference on
Learning Representation(ICLR). Xiang Wei and Boqing Gong contributed equally
in this wor
On Discrete Prompt Optimization for Diffusion Models
This paper introduces the first gradient-based framework for prompt
optimization in text-to-image diffusion models. We formulate prompt engineering
as a discrete optimization problem over the language space. Two major
challenges arise in efficiently finding a solution to this problem: (1)
Enormous Domain Space: Setting the domain to the entire language space poses
significant difficulty to the optimization process. (2) Text Gradient:
Efficiently computing the text gradient is challenging, as it requires
backpropagating through the inference steps of the diffusion model and a
non-differentiable embedding lookup table. Beyond the problem formulation, our
main technical contributions lie in solving the above challenges. First, we
design a family of dynamically generated compact subspaces comprised of only
the most relevant words to user input, substantially restricting the domain
space. Second, we introduce "Shortcut Text Gradient" -- an effective
replacement for the text gradient that can be obtained with constant memory and
runtime. Empirical evaluation on prompts collected from diverse sources
(DiffusionDB, ChatGPT, COCO) suggests that our method can discover prompts that
substantially improve (prompt enhancement) or destroy (adversarial attack) the
faithfulness of images generated by the text-to-image diffusion model.Comment: ICML 2024. Code available at
https://github.com/ruocwang/dpo-diffusio
Automatic facial expression recognition on a single 3D face by exploring shape deformation
Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect
Despite being impactful on a variety of problems and applications, the
generative adversarial nets (GANs) are remarkably difficult to train. This
issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an
alternative direction to avoid the caveats in the minmax two-player training of
GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the
1-Lipschitz continuity of the discriminator. In this paper, we propose a novel
approach to enforcing the Lipschitz continuity in the training procedure of
WGANs. Our approach seamlessly connects WGAN with one of the recent
semi-supervised learning methods. As a result, it gives rise to not only better
photo-realistic samples than the previous methods but also state-of-the-art
semi-supervised learning results. In particular, our approach gives rise to the
inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the
first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000
labeled images, to the best of our knowledge.Comment: Accepted as a conference paper in International Conference on
Learning Representation(ICLR). Xiang Wei and Boqing Gong contributed equally
in this wor
Video Timeline Modeling For News Story Understanding
In this paper, we present a novel problem, namely video timeline modeling.
Our objective is to create a video-associated timeline from a set of videos
related to a specific topic, thereby facilitating the content and structure
understanding of the story being told. This problem has significant potential
in various real-world applications, such as news story summarization. To
bootstrap research in this area, we curate a realistic benchmark dataset,
YouTube-News-Timeline, consisting of over k timelines and k YouTube
news videos. Additionally, we propose a set of quantitative metrics as the
protocol to comprehensively evaluate and compare methodologies. With such a
testbed, we further develop and benchmark exploratory deep learning approaches
to tackle this problem. We anticipate that this exploratory work will pave the
way for further research in video timeline modeling. The assets are available
via
https://github.com/google-research/google-research/tree/master/video_timeline_modeling.Comment: Accepted as a spotlight by NeurIPS 2023, Track on Datasets and
Benchmark
- …
