15 research outputs found
Likelihood-Based Diffusion Language Models
Despite a growing interest in diffusion-based language models, existing work
has not shown that these models can attain nontrivial likelihoods on standard
language modeling benchmarks. In this work, we take the first steps towards
closing the likelihood gap between autoregressive and diffusion-based language
models, with the goal of building and releasing a diffusion model which
outperforms a small but widely-known autoregressive model. We pursue this goal
through algorithmic improvements, scaling laws, and increased compute. On the
algorithmic front, we introduce several methodological improvements for the
maximum-likelihood training of diffusion language models. We then study scaling
laws for our diffusion models and find compute-optimal training regimes which
differ substantially from autoregressive models. Using our methods and scaling
analysis, we train and release Plaid 1B, a large diffusion language model which
outperforms GPT-2 124M in likelihood on benchmark datasets and generates fluent
samples in unconditional and zero-shot control settings
SimSwap: An Efficient Framework For High Fidelity Face Swapping
We propose an efficient framework, called Simple Swap (SimSwap), aiming for
generalized and high fidelity face swapping. In contrast to previous approaches
that either lack the ability to generalize to arbitrary identity or fail to
preserve attributes like facial expression and gaze direction, our framework is
capable of transferring the identity of an arbitrary source face into an
arbitrary target face while preserving the attributes of the target face. We
overcome the above defects in the following two ways. First, we present the ID
Injection Module (IIM) which transfers the identity information of the source
face into the target face at feature level. By using this module, we extend the
architecture of an identity-specific face swapping algorithm to a framework for
arbitrary face swapping. Second, we propose the Weak Feature Matching Loss
which efficiently helps our framework to preserve the facial attributes in an
implicit way. Extensive experiments on wild faces demonstrate that our SimSwap
is able to achieve competitive identity performance while preserving attributes
better than previous state-of-the-art methods. The code is already available on
github: https://github.com/neuralchen/SimSwap.Comment: Accepted by ACMMM 202
Attacking Recommender Systems with Augmented User Profiles
Recommendation Systems (RS) have become an essential part of many online
services. Due to its pivotal role in guiding customers towards purchasing,
there is a natural motivation for unscrupulous parties to spoof RS for profits.
In this paper, we study the shilling attack: a subsistent and profitable attack
where an adversarial party injects a number of user profiles to promote or
demote a target item. Conventional shilling attack models are based on simple
heuristics that can be easily detected, or directly adopt adversarial attack
methods without a special design for RS. Moreover, the study on the attack
impact on deep learning based RS is missing in the literature, making the
effects of shilling attack against real RS doubtful. We present a novel
Augmented Shilling Attack framework (AUSH) and implement it with the idea of
Generative Adversarial Network. AUSH is capable of tailoring attacks against RS
according to budget and complex attack goals, such as targeting a specific user
group. We experimentally show that the attack impact of AUSH is noticeable on a
wide range of RS including both classic and modern deep learning based RS,
while it is virtually undetectable by the state-of-the-art attack detection
model.Comment: CIKM 2020. 10 pages, 2 figure
Drum Synthesis and Rhythmic Transformation with Adversarial Autoencoders
Creative rhythmic transformations of musical audio refer to automated methods for manipulation of temporally-relevant sounds in time. This paper presents a method for joint synthesis and rhythm transformation of drum sounds through the use of adversarial autoencoders (AAE). Users may navigate both the timbre and rhythm of drum patterns in audio recordings through expressive control over a low-dimensional latent space. The model is based on an AAE with Gaussian mixture latent distributions that introduce rhythmic pattern conditioning to represent a wide variety of drum performances. The AAE is trained on a dataset of bar-length segments of percussion recordings, along with their clustered rhythmic pattern labels. The decoder is conditioned during adversarial training for mixing of data-driven rhythmic and timbral properties. The system is trained with over 500000 bars from 5418 tracks in popular datasets covering various musical genres. In an evaluation using real percussion recordings, the reconstruction accuracy and latent space interpolation between drum performances are investigated for audio generation conditioned by target rhythmic patterns
Diffusion-LM Improves Controllable Text Generation
Controlling the behavior of language models (LMs) without re-training is a
major open problem in natural language generation. While recent works have
demonstrated successes on controlling simple sentence attributes (e.g.,
sentiment), there has been little progress on complex, fine-grained controls
(e.g., syntactic structure). To address this challenge, we develop a new
non-autoregressive language model based on continuous diffusions that we call
Diffusion-LM. Building upon the recent successes of diffusion models in
continuous domains, Diffusion-LM iteratively denoises a sequence of Gaussian
vectors into word vectors, yielding a sequence of intermediate latent
variables. The continuous, hierarchical nature of these intermediate variables
enables a simple gradient-based algorithm to perform complex, controllable
generation tasks. We demonstrate successful control of Diffusion-LM for six
challenging fine-grained control tasks, significantly outperforming prior work