Search CORE

38 research outputs found

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation

Author: Hu Han
Koike Hideki
Peng Houwen
Qiu Lili
Shen Yifei
Sun Yasheng
Yang Yifan
Yang Yuqing
Publication venue
Publication date: 01/08/2023
Field of study

While language-guided image manipulation has made remarkable progress, the challenge of how to instruct the manipulation process faithfully reflecting human intentions persists. An accurate and comprehensive description of a manipulation task using natural language is laborious and sometimes even impossible, primarily due to the inherent uncertainty and ambiguity present in linguistic expressions. Is it feasible to accomplish image manipulation without resorting to external cross-modal language information? If this possibility exists, the inherent modality gap would be effortlessly eliminated. In this paper, we propose a novel manipulation methodology, dubbed ImageBrush, that learns visual instructions for more accurate image editing. Our key idea is to employ a pair of transformation images as visual instructions, which not only precisely captures human intention but also facilitates accessibility in real-world scenarios. Capturing visual instructions is particularly challenging because it involves extracting the underlying intentions solely from visual demonstrations and then applying this operation to a new image. To address this challenge, we formulate visual instruction learning as a diffusion-based inpainting problem, where the contextual information is fully exploited through an iterative process of generation. A visual prompting encoder is carefully devised to enhance the model's capacity in uncovering human intent behind the visual instructions. Extensive experiments show that our method generates engaging manipulation results conforming to the transformations entailed in demonstrations. Moreover, our model exhibits robust generalization capabilities on various downstream tasks such as pose transfer, image translation and video inpainting

arXiv.org e-Print Archive

Sub-Character Tokenization for Chinese Pretrained Language Models

Author: Chen Yingfa
Liu Qun
Liu Zhiyuan
Qi Fanchao
Si Chenglei
Sun Maosong
Wang Xiaozhi
Wang Yasheng
Zhang Zhengyan
Publication venue
Publication date: 22/12/2021
Field of study

Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system where additional linguistic information exists below the character level, i.e., at the sub-character level. To utilize such information, we propose sub-character (SubChar for short) tokenization. Specifically, we first encode the input text by converting each Chinese character into a short sequence based on its glyph or pronunciation, and then construct the vocabulary based on the encoded text with sub-word tokenization. Experimental results show that SubChar tokenizers have two main advantages over existing tokenizers: 1) They can tokenize inputs into much shorter sequences, thus improving the computational efficiency. 2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to all homophone typos. At the same time, models trained with SubChar tokenizers perform competitively on downstream tasks. We release our code at https://github.com/thunlp/SubCharTokenization to facilitate future work.Comment: This draft supersedes the previous version named "SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining

arXiv.org e-Print Archive

Directory of Open Access Journals

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

Author: Jiang Xin
Li Yongwei
Liu Zhiyuan
Lv Tian
Qi Fanchao
Sun Maosong
Wang Yasheng
Xiao Guangxuan
Zhang Zhengyan
Publication venue
Publication date: 13/06/2021
Field of study

Pre-trained models (PTMs) have been widely used in various downstream tasks. The parameters of PTMs are distributed on the Internet and may suffer backdoor attacks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks. Specifically, attackers can add a simple pre-training task, which restricts the output representations of trigger instances to pre-defined vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor functionality is not eliminated during fine-tuning, the triggers can make the fine-tuned model predict fixed labels by pre-defined vectors. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA absolutely controls the predictions for trigger instances without any knowledge of downstream tasks. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising direction to resist NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the wide use of PTMs. Our source code and models are available at \url{https://github.com/thunlp/NeuBA}

arXiv.org e-Print Archive

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

Author: Ding Errui
Hong Zhibin
Koike Hideki
Liu Jingtuo
Liu Ziwei
Sun Yasheng
Wang Jingdong
Wang Kaisiyuan
Wu Qianyi
Zhou Hang
Publication venue
Publication date: 09/12/2022
Field of study

Previous studies have explored generating accurately lip-synced talking faces for arbitrary targets given audio conditions. However, most of them deform or generate the whole facial area, leading to non-realistic results. In this work, we delve into the formulation of altering only the mouth shapes of the target person. This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames. To this end, we propose the Audio-Visual Context-Aware Transformer (AV-CAT) framework, which produces accurate lip-sync with photo-realistic quality by predicting the masked mouth shapes. Our key insight is to exploit desired contextual information provided in audio and visual modalities thoroughly with delicately designed Transformers. Specifically, we propose a convolution-Transformer hybrid backbone and design an attention-based fusion strategy for filling the masked parts. It uniformly attends to the textural information on the unmasked regions and the reference frame. Then the semantic audio information is involved in enhancing the self-attention computation. Additionally, a refinement network with audio injection improves both image and lip-sync quality. Extensive experiments validate that our model can generate high-fidelity lip-synced results for arbitrary subjects.Comment: Accepted to SIGGRAPH Asia 2022 (Conference Proceedings). Project page: https://hangz-nju-cuhk.github.io/projects/AV-CA

arXiv.org e-Print Archive

Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation

Author: Ding Errui
He Dongliang
Hu Tianshu
Koike Hideki
Liao Chen-Chieh
Liu Jingtuo
Liu Ziwei
Miyafuji Shio
Sun Yasheng
Wang Jingdong
Wang Kaisiyuan
Wu Qianyi
Zhou Hang
Publication venue
Publication date: 14/02/2023
Field of study

Creating the photo-realistic version of people sketched portraits is useful to various entertainment purposes. Existing studies only generate portraits in the 2D plane with fixed views, making the results less vivid. In this paper, we present Stereoscopic Simplified Sketch-to-Portrait (SSSP), which explores the possibility of creating Stereoscopic 3D-aware portraits from simple contour sketches by involving 3D generative models. Our key insight is to design sketch-aware constraints that can fully exploit the prior knowledge of a tri-plane-based 3D-aware generative model. Specifically, our designed region-aware volume rendering strategy and global consistency constraint further enhance detail correspondences during sketch encoding. Moreover, in order to facilitate the usage of layman users, we propose a Contour-to-Sketch module with vector quantized representations, so that easily drawn contours can directly guide the generation of 3D portraits. Extensive comparisons show that our method generates high-quality results that match the sketch. Our usability study verifies that our system is greatly preferred by user.Comment: Project Page on https://hangz-nju-cuhk.github.io

arXiv.org e-Print Archive

Recommended from our members

Cyclin D-CDK4 kinase destabilizes PD-L1 via Cul3SPOP to control cancer immune surveillance

Author: Bu Xia
Ci Yanpeng
Dai Xiangpeng
Fan Caoqi
Freeman Gordon J.
Geng Yan
Guo Jianping
Huang Yu-Han
Nihira Naoe Taira
Ren Shancheng
Sicinski Piotr
Sun Yinghao
Tan Yuyong
Wang Haizhen
Wei Wenyi
Wu Fei
Zhang Jinfang
Zhu Yasheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/06/2018
Field of study

Harvard University - DASH

Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation

Author: Koike Hideki
Sun Yasheng
Sun Yasheng
小池英樹
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 27/06/2023
Field of study

Institutional Repositories DataBase (IRDB)

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

Author: Koike Hideki
Sun Yasheng
Sun Yasheng
小池英樹
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2023
Field of study

Institutional Repositories DataBase (IRDB)

Delay-Sensitive Service Provisioning in Software-Defined Low-Earth-Orbit Satellite Networks

Author: Chenhua Sun
Feihu Dong
Guiyu Liu
Hongzhi Yu
Yasheng Zhang
Publication venue: MDPI AG
Publication date: 01/08/2023
Field of study

With the advancement of space technology and satellite communications, low-Earth-orbit (LEO) satellite networks have experienced rapid development in the past decade. In the vision of 6G, LEO satellite networks play an important role in future 6G networks. On the other hand, a variety of applications, including many delay-sensitive applications, are continuously emerging. Due to the highly dynamic nature of LEO satellite networks, supporting time-deterministic services in such networks is challenging. However, we can provide latency guarantees for most delay-sensitive applications through data plane traffic shaping and control plane routing optimization. This paper addresses the routing optimization problem for time-sensitive (TS) flows in software-defined low-Earth-orbit (LEO) satellite networks. We model the problem as an integer linear programming (ILP) model aiming to minimize path handovers and maximum link utilization while meeting TS flow latency constraints. Since this problem is NP-hard, we design an efficient longest continuous path (LCP) approximation algorithm. LCP selects the longest valid path in each topology snapshot that satisfies delay constraints. An auxiliary graph then determines the routing sequence with minimized handovers. We implement an LEO satellite network testbed with Open vSwitch (OVS) and an open-network operating system (ONOS) controller to evaluate LCP. The results show that LCP reduces the number of path handovers by up to 31.7% and keeps the maximum link utilization lowest for more than 75% of the time compared to benchmark algorithms. In summary, LCP achieves excellent path handover optimization and load balancing performance under TS flow latency constraints

Directory of Open Access Journals