Search CORE

3 research outputs found

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

Author: Chang Shuning
Chen Yunpeng
Fan Zihan
Ge Yixiao
Gu Yuchao
Shan Ying
Shi Yujun
Shou Mike Zheng
Wang Xintao
Wu Jay Zhangjie
Wu Weijia
Xiao Wuyou
Zhao Rui
Publication venue
Publication date: 29/05/2023
Field of study

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, the utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We refer to this scenario as decentralized multi-concept customization, which involves single-client concept tuning and center-node concept fusion. In this paper, we propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization, including concept conflicts resulting from existing single-client LoRA tuning and identity loss during model fusion. Mix-of-Show adopts an embedding-decomposed LoRA (ED-LoRA) for single-client tuning and gradient fusion for the center node to preserve the in-domain essence of single concepts and support theoretically limitless concept fusion. Additionally, we introduce regionally controllable sampling, which extends spatially controllable sampling (e.g., ControlNet and T2I-Adaptor) to address attribute binding and missing object problems in multi-concept sampling. Extensive experiments demonstrate that Mix-of-Show is capable of composing multiple customized concepts with high fidelity, including characters, objects, and scenes

arXiv.org e-Print Archive

CVPR 2023 Text Guided Video Editing Competition

Author: Bai Jinbin
Cheng Xu
Dong Zhen
Gao Difei
He Rui
Hu Feng
Hu Junhua
Huang Hai
Huang Zuwei
Iandola Forrest
Keutzer Kurt
Li Xiuyu
Li Youzeng
Shou Mike Zheng
Singh Aishani
Sun Yuanxi
Tang Jie
Wu Jay Zhangjie
Xiang Xiaoyu
Zhu Hanyu
Publication venue
Publication date: 24/10/2023
Field of study

Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no standard benchmark. So, we propose a new dataset for text-guided video editing (TGVE), and we run a competition at CVPR to evaluate models on our TGVE dataset. In this paper we present a retrospective on the competition and describe the winning method. The competition dataset is available at https://sites.google.com/view/loveucvpr23/track4.Comment: Project page: https://sites.google.com/view/loveucvpr23/track

arXiv.org e-Print Archive

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Author: Gao Difei
Lei Stan Weixian
Liu Wei
Shou Mike Zheng
Wang Yuxuan
Wu Jay Zhangjie
Zhang Mengmi
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to study how to answer questions when deploying VQA systems to new environments (new Visual scenes) and how to answer questions requiring new functions (new Question types). Thus, we propose CLOVE, a benchmark for Continual Learning On Visual quEstion answering, which contains scene- and function-incremental settings for the two aforementioned CL scenarios. In terms of methodology, the main difference between CL on VQA and classification is that the former additionally involves expanding and preventing forgetting of reasoning mechanisms, while the latter focusing on class representation. Thus, we propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Symbolic Replay. Using a piece of scene graph as a prompt, it replays pseudo scene graphs to represent the past images, along with correlated QA pairs. A unified VQA model is also proposed to utilize the current and replayed data to enhance its QA ability. Finally, experimental results reveal challenges in CLOVE and demonstrate the effectiveness of our method. Code and data are available at https://github.com/showlab/CLVQA

Association for the Advancement of Artificial Intelligence: AAAI Publications