1,972 research outputs found
One-Shot Unsupervised Cross Domain Translation
Given a single image x from domain A and a set of images from domain B, our
task is to generate the analogous of x in B. We argue that this task could be a
key AI capability that underlines the ability of cognitive agents to act in the
world and present empirical evidence that the existing unsupervised domain
translation methods fail on this task. Our method follows a two step process.
First, a variational autoencoder for domain B is trained. Then, given the new
sample x, we create a variational autoencoder for domain A by adapting the
layers that are close to the image in order to directly fit x, and only
indirectly adapt the other layers. Our experiments indicate that the new method
does as well, when trained on one sample x, as the existing domain transfer
methods, when these enjoy a multitude of training samples from domain A. Our
code is made publicly available at
https://github.com/sagiebenaim/OneShotTranslationComment: Published at NIPS 201
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space
Most of the successful and predominant methods for bilingual lexicon
induction (BLI) are mapping-based, where a linear mapping function is learned
with the assumption that the word embedding spaces of different languages
exhibit similar geometric structures (i.e., approximately isomorphic). However,
several recent studies have criticized this simplified assumption showing that
it does not hold in general even for closely related languages. In this work,
we propose a novel semi-supervised method to learn cross-lingual word
embeddings for BLI. Our model is independent of the isomorphic assumption and
uses nonlinear mapping in the latent space of two independently trained
auto-encoders. Through extensive experiments on fifteen (15) different language
pairs (in both directions) comprising resource-rich and low-resource languages
from two different datasets, we demonstrate that our method outperforms
existing models by a good margin. Ablation studies show the importance of
different model components and the necessity of non-linear mapping.Comment: 10 pages, 1 figur
Adversarial Text Generation via Feature-Mover's Distance
Generative adversarial networks (GANs) have achieved significant success in
generating real-valued data. However, the discrete nature of text hinders the
application of GAN to text-generation tasks. Instead of using the standard GAN
objective, we propose to improve text-generation GAN via a novel approach
inspired by optimal transport. Specifically, we consider matching the latent
feature distributions of real and synthetic sentences using a novel metric,
termed the feature-mover's distance (FMD). This formulation leads to a highly
discriminative critic and easy-to-optimize objective, overcoming the
mode-collapsing and brittle-training problems in existing methods. Extensive
experiments are conducted on a variety of tasks to evaluate the proposed model
empirically, including unconditional text generation, style transfer from
non-parallel text, and unsupervised cipher cracking. The proposed model yields
superior performance, demonstrating wide applicability and effectiveness
Learning Inverse Mapping by Autoencoder based Generative Adversarial Nets
The inverse mapping of GANs'(Generative Adversarial Nets) generator has a
great potential value.Hence, some works have been developed to construct the
inverse function of generator by directly learning or adversarial
learning.While the results are encouraging, the problem is highly challenging
and the existing ways of training inverse models of GANs have many
disadvantages, such as hard to train or poor performance.Due to these reasons,
we propose a new approach based on using inverse generator () model as
encoder and pre-trained generator () as decoder of an AutoEncoder network to
train the model. In the proposed model, the difference between the input
and output, which are both the generated image of pre-trained GAN's generator,
of AutoEncoder is directly minimized. The optimizing method can overcome the
difficulty in training and inverse model of an non one-to-one function.We also
applied the inverse model of GANs' generators to image searching and
translation.The experimental results prove that the proposed approach works
better than the traditional approaches in image searching.Comment: 10 pages, 5 figure
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
We present an unsupervised end-to-end training scheme where we discover
discrete subword units from speech without using any labels. The discrete
subword units are learned under an ASR-TTS autoencoder reconstruction setting,
where an ASR-Encoder is trained to discover a set of common linguistic units
given a variety of speakers, and a TTS-Decoder trained to project the
discovered units back to the designated speech. We propose a discrete encoding
method, Multilabel-Binary Vectors (MBV), to make the ASR-TTS autoencoder
differentiable. We found that the proposed encoding method offers automatic
extraction of speech content from speaker style, and is sufficient to cover
full linguistic content in a given language. Therefore, the TTS-Decoder can
synthesize speech with the same content as the input of ASR-Encoder but with
different speaker characteristics, which achieves voice conversion (VC). We
further improve the quality of VC using adversarial training, where we train a
TTS-Patcher that augments the output of TTS-Decoder. Objective and subjective
evaluations show that the proposed approach offers strong VC results as it
eliminates speaker identity while preserving content within speech. In the
ZeroSpeech 2019 Challenge, we achieved outstanding performance in terms of low
bitrate.Comment: Accepted by Interspeech 2019, Graz, Austri
Sentiment Transfer using Seq2Seq Adversarial Autoencoders
Expressing in language is subjective. Everyone has a different style of
reading and writing, apparently it all boil downs to the way their mind
understands things (in a specific format). Language style transfer is a way to
preserve the meaning of a text and change the way it is expressed. Progress in
language style transfer is lagged behind other domains, such as computer
vision, mainly because of the lack of parallel data, use cases, and reliable
evaluation metrics. In response to the challenge of lacking parallel data, we
explore learning style transfer from non-parallel data. We propose a model
combining seq2seq, autoencoders, and adversarial loss to achieve this goal. The
key idea behind the proposed models is to learn separate content
representations and style representations using adversarial networks.
Considering the problem of evaluating style transfer tasks, we frame the
problem as sentiment transfer and evaluation using a sentiment classifier to
calculate how many sentiments was the model able to transfer. We report our
results on several kinds of models.Comment: Report built as a part of project for CSYE7245 Northeastern
University under Prof. Nik Brown. arXiv admin note: text overlap with
arXiv:1711.06861, arXiv:1409.3215, arXiv:1705.07663 by other author
Eval all, trust a few, do wrong to none: Comparing sentence generation models
In this paper, we study recent neural generative models for text generation
related to variational autoencoders. Previous works have employed various
techniques to control the prior distribution of the latent codes in these
models, which is important for sampling performance, but little attention has
been paid to reconstruction error. In our study, we follow a rigorous
evaluation protocol using a large set of previously used and novel automatic
and human evaluation metrics, applied to both generated samples and
reconstructions. We hope that it will become the new evaluation standard when
comparing neural generative models for text.Comment: 12 pages (3 page appendix); v2: added hyperparameter settings,
clarification
Towards Unsupervised Speech-to-Text Translation
We present a framework for building speech-to-text translation (ST) systems
using only monolingual speech and text corpora, in other words, speech
utterances from a source language and independent text from a target language.
As opposed to traditional cascaded systems and end-to-end architectures, our
system does not require any labeled data (i.e., transcribed source audio or
parallel source and target text corpora) during training, making it especially
applicable to language pairs with very few or even zero bilingual resources.
The framework initializes the ST system with a cross-modal bilingual dictionary
inferred from the monolingual corpora, that maps every source speech segment
corresponding to a spoken word to its target text translation. For unseen
source speech utterances, the system first performs word-by-word translation on
each speech segment in the utterance. The translation is improved by leveraging
a language model and a sequence denoising autoencoder to provide prior
knowledge about the target language. Experimental results show that our
unsupervised system achieves comparable BLEU scores to supervised end-to-end
models despite the lack of supervision. We also provide an ablation analysis to
examine the utility of each component in our system
Deconvolutional Paragraph Representation Learning
Learning latent representations from long text sequences is an important
first step in many natural language processing applications. Recurrent Neural
Networks (RNNs) have become a cornerstone for this challenging task. However,
the quality of sentences during RNN-based decoding (reconstruction) decreases
with the length of the text. We propose a sequence-to-sequence, purely
convolutional and deconvolutional autoencoding framework that is free of the
above issue, while also being computationally efficient. The proposed method is
simple, easy to implement and can be leveraged as a building block for many
applications. We show empirically that compared to RNNs, our framework is
better at reconstructing and correcting long paragraphs. Quantitative
evaluation on semi-supervised text classification and summarization tasks
demonstrate the potential for better utilization of long unlabeled text data.Comment: Accepted by NIPS 201
A Tutorial on Deep Latent Variable Models of Natural Language
There has been much recent, exciting work on combining the complementary
strengths of latent variable models and deep learning. Latent variable modeling
makes it easy to explicitly specify model constraints through conditional
independence properties, while deep learning makes it possible to parameterize
these conditional likelihoods with powerful function approximators. While these
"deep latent variable" models provide a rich, flexible framework for modeling
many real-world phenomena, difficulties exist: deep parameterizations of
conditional likelihoods usually make posterior inference intractable, and
latent variable objectives often complicate backpropagation by introducing
points of non-differentiability. This tutorial explores these issues in depth
through the lens of variational inference.Comment: EMNLP 2018 Tutoria
- …