Search CORE

140 research outputs found

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

Author: Gong Boqing
Liu Zixia
Lu Wei
Wang Liqiang
Wei Xiang
Publication venue
Publication date: 01/01/2018
Field of study

Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax two-player training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1-Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semi-supervised learning methods. As a result, it gives rise to not only better photo-realistic samples than the previous methods but also state-of-the-art semi-supervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR-10 images and is the first that exceeds the accuracy of 90% on the CIFAR-10 dataset using only 4,000 labeled images, to the best of our knowledge.Comment: Accepted as a conference paper in International Conference on Learning Representation(ICLR). Xiang Wei and Boqing Gong contributed equally in this wor

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

On Discrete Prompt Optimization for Diffusion Models

Author: Gong Boqing
Hsieh Cho-Jui
Liu Ting
Wang Ruochen
Publication venue
Publication date: 26/06/2024
Field of study

This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the optimization process. (2) Text Gradient: Efficiently computing the text gradient is challenging, as it requires backpropagating through the inference steps of the diffusion model and a non-differentiable embedding lookup table. Beyond the problem formulation, our main technical contributions lie in solving the above challenges. First, we design a family of dynamically generated compact subspaces comprised of only the most relevant words to user input, substantially restricting the domain space. Second, we introduce "Shortcut Text Gradient" -- an effective replacement for the text gradient that can be obtained with constant memory and runtime. Empirical evaluation on prompts collected from diverse sources (DiffusionDB, ChatGPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model.Comment: ICML 2024. Code available at https://github.com/ruocwang/dpo-diffusio

arXiv.org e-Print Archive

Automatic facial expression recognition on a single 3D face by exploring shape deformation

Author: Boqing Gong
Jianzhuang Liu
Xiaoou Tang
Yueming Wang
Publication venue: Association for Computing Machinery (ACM)
Publication date
Field of study

Crossref

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

Author: Wei Xiang
Gong Boqing
Liu Zixia
Lu Wei
Wang Liqiang
Publication venue
Publication date: 01/01/2009
Field of study

arXiv.org e-Print Archive

Memoria Académica

Video Timeline Modeling For News Story Understanding

Author: Dai Hanjun
Feng Zheyun
Gong Boqing
Ji Shuiwang
Liu Jialu
Liu Meng
Yang Ming-Hsuan
Zhang Mingda
Publication venue
Publication date: 23/09/2023
Field of study

In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, such as news story summarization. To bootstrap research in this area, we curate a realistic benchmark dataset, YouTube-News-Timeline, consisting of over

12

k timelines and

300

k YouTube news videos. Additionally, we propose a set of quantitative metrics as the protocol to comprehensively evaluate and compare methodologies. With such a testbed, we further develop and benchmark exploratory deep learning approaches to tackle this problem. We anticipate that this exploratory work will pave the way for further research in video timeline modeling. The assets are available via https://github.com/google-research/google-research/tree/master/video_timeline_modeling.Comment: Accepted as a spotlight by NeurIPS 2023, Track on Datasets and Benchmark

arXiv.org e-Print Archive