Search CORE

15 research outputs found

Likelihood-Based Diffusion Language Models

Author: Gulrajani Ishaan
Hashimoto Tatsunori B.
Publication venue
Publication date: 30/05/2023
Field of study

Despite a growing interest in diffusion-based language models, existing work has not shown that these models can attain nontrivial likelihoods on standard language modeling benchmarks. In this work, we take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms a small but widely-known autoregressive model. We pursue this goal through algorithmic improvements, scaling laws, and increased compute. On the algorithmic front, we introduce several methodological improvements for the maximum-likelihood training of diffusion language models. We then study scaling laws for our diffusion models and find compute-optimal training regimes which differ substantially from autoregressive models. Using our methods and scaling analysis, we train and release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets and generates fluent samples in unconditional and zero-shot control settings

arXiv.org e-Print Archive

SimSwap: An Efficient Framework For High Fidelity Face Swapping

Author: Bao Jianmin
Brock Andrew
Deng Jiankang
Goodfellow Ian J.
Gulrajani Ishaan
He Kaiming
Huang Xun
Ioffe Sergey
Isola Phillip
Korshunova Iryna
Liu Ming-Yu
Liu Ziwei
Natsume Ryota
Nirkin Yuval
Park Taesung
Wang Ting-Chun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/06/2021
Field of study

We propose an efficient framework, called Simple Swap (SimSwap), aiming for generalized and high fidelity face swapping. In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes like facial expression and gaze direction, our framework is capable of transferring the identity of an arbitrary source face into an arbitrary target face while preserving the attributes of the target face. We overcome the above defects in the following two ways. First, we present the ID Injection Module (IIM) which transfers the identity information of the source face into the target face at feature level. By using this module, we extend the architecture of an identity-specific face swapping algorithm to a framework for arbitrary face swapping. Second, we propose the Weak Feature Matching Loss which efficiently helps our framework to preserve the facial attributes in an implicit way. Extensive experiments on wild faces demonstrate that our SimSwap is able to achieve competitive identity performance while preserving attributes better than previous state-of-the-art methods. The code is already available on github: https://github.com/neuralchen/SimSwap.Comment: Accepted by ACMMM 202

arXiv.org e-Print Archive

Crossref

Attacking Recommender Systems with Augmented User Profiles

Author: Biggio Battista
Burke LimitedRobin
Carlos
David
Diederik
Goodfellow Ian J.
Gulrajani Ishaan
He Xiangnan
Lee Daniel D.
Li Bo
Li Hui
Li Hui
Liu Ming-Yu
O'Mahony Michael P.
Radford Alec
Roli Fabio
Sandvig Jeff J.
Sedhain Suvash
Shyong
Wang Cheng
Xing Xinyu
Yu Lantao
Zhang Shuai
Zhang Yongfeng
Zhao Zhengli
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/07/2020
Field of study

Recommendation Systems (RS) have become an essential part of many online services. Due to its pivotal role in guiding customers towards purchasing, there is a natural motivation for unscrupulous parties to spoof RS for profits. In this paper, we study the shilling attack: a subsistent and profitable attack where an adversarial party injects a number of user profiles to promote or demote a target item. Conventional shilling attack models are based on simple heuristics that can be easily detected, or directly adopt adversarial attack methods without a special design for RS. Moreover, the study on the attack impact on deep learning based RS is missing in the literature, making the effects of shilling attack against real RS doubtful. We present a novel Augmented Shilling Attack framework (AUSH) and implement it with the idea of Generative Adversarial Network. AUSH is capable of tailoring attacks against RS according to budget and complex attack goals, such as targeting a specific user group. We experimentally show that the attack impact of AUSH is noticeable on a wide range of RS including both classic and modern deep learning based RS, while it is virtually undetectable by the state-of-the-art attack detection model.Comment: CIKM 2020. 10 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders

Author: Aouameur Cyran
Arjovsky Martin
Bitton Adrien
Böck Sebastian
Böck Sebastian
Chen Xi
Diederik
Dieleman Sander
Dixon Simon
Dixon Simon
Donahue Chris
Drysdale Jake
Engel Jesse
Engel Jesse
Fu Zhenxin
Glorot Xavier
Goodfellow Ian
Gouyon Fabien
Gulrajani Ishaan
Hennequin Romain
Hockman Jason
Hockman Jason A.
Hsu Wei-Ning
Ioffe Sergey
Krebs Florian
Luo Yin-Jyun
López-Serrano Patricio
Mauch Matthias
Meseguer-Brocal Gabriel
Nagarajan Vaishnavh
Nieto Oriol
Peeters Geoffroy
Pelleg Dan
Ravelli Emmanuel
Razavi Ali
Tolstikhin Ilya
Tomczak Maciek
Tomczak Maciek
Yoshii Kazuyoshi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/10/2020
Field of study

Creative rhythmic transformations of musical audio refer to automated methods for manipulation of temporally-relevant sounds in time. This paper presents a method for joint synthesis and rhythm transformation of drum sounds through the use of adversarial autoencoders (AAE). Users may navigate both the timbre and rhythm of drum patterns in audio recordings through expressive control over a low-dimensional latent space. The model is based on an AAE with Gaussian mixture latent distributions that introduce rhythmic pattern conditioning to represent a wide variety of drum performances. The AAE is trained on a dataset of bar-length segments of percussion recordings, along with their clustered rhythmic pattern labels. The decoder is conditioned during adversarial training for mixing of data-driven rhythmic and timbral properties. The system is trained with over 500000 bars from 5418 tracks in popular datasets covering various musical genres. In an evaluation using real percussion recordings, the reconstruction accuracy and latent space interpolation between drum performances are investigated for audio generation conditioned by target rhythmic patterns

Crossref

Birmingham City University Open Access Repository

BCU Open Access

Diffusion-LM Improves Controllable Text Generation

Author: Gulrajani Ishaan
Hashimoto Tatsunori B.
Li Xiang Lisa
Liang Percy
Thickstun John
Publication venue
Publication date: 27/05/2022
Field of study

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. Building upon the recent successes of diffusion models in continuous domains, Diffusion-LM iteratively denoises a sequence of Gaussian vectors into word vectors, yielding a sequence of intermediate latent variables. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work

arXiv.org e-Print Archive