4,837 research outputs found
Adversarial Contrastive Estimation
Learning by contrasting positive and negative samples is a general strategy
adopted by many methods. Noise contrastive estimation (NCE) for word embeddings
and translating embeddings for knowledge graphs are examples in NLP employing
this approach. In this work, we view contrastive learning as an abstraction of
all such methods and augment the negative sampler into a mixture distribution
containing an adversarially learned sampler. The resulting adaptive sampler
finds harder negative examples, which forces the main model to learn a better
representation of the data. We evaluate our proposal on learning word
embeddings, order embeddings and knowledge graph embeddings and observe both
faster convergence and improved results on multiple metrics.Comment: Association for Computational Linguistics, 201
Variational Inference using Implicit Distributions
Generative adversarial networks (GANs) have given us a great tool to fit
implicit generative models to data. Implicit distributions are ones we can
sample from easily, and take derivatives of samples with respect to model
parameters. These models are highly expressive and we argue they can prove just
as useful for variational inference (VI) as they are for generative modelling.
Several papers have proposed GAN-like algorithms for inference, however,
connections to the theory of VI are not always well understood. This paper
provides a unifying review of existing algorithms establishing connections
between variational autoencoders, adversarially learned inference, operator VI,
GAN-based image reconstruction, and more. Secondly, the paper provides a
framework for building new algorithms: depending on the way the variational
bound is expressed we introduce prior-contrastive and joint-contrastive
methods, and show practical inference algorithms based on either density ratio
estimation or denoising
Exponential Family Estimation via Adversarial Dynamics Embedding
We present an efficient algorithm for maximum likelihood estimation (MLE) of
exponential family models, with a general parametrization of the energy
function that includes neural networks. We exploit the primal-dual view of the
MLE with a kinetics augmented model to obtain an estimate associated with an
adversarial dual sampler. To represent this sampler, we introduce a novel
neural architecture, dynamics embedding, that generalizes Hamiltonian
Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while
enabling tractable entropy estimation for the augmented model. By learning both
a dual sampler and the primal model simultaneously, and sharing parameters
between them, we obviate the requirement to design a separate sampling
procedure once the model has been trained, leading to more effective learning.
We show that many existing estimators, such as contrastive divergence,
pseudo/composite-likelihood, score matching, minimum Stein discrepancy
estimator, non-local contrastive objectives, noise-contrastive estimation, and
minimum probability flow, are special cases of the proposed approach, each
expressed by a different (fixed) dual sampler. An empirical investigation shows
that adapting the sampler during MLE can significantly improve on
state-of-the-art estimators.Comment: Appearing in NeurIPS 2019 Vancouver, Canada; a preliminary version
published in NeurIPS2018 Bayesian Deep Learning Worksho
Semi-supervised Learning with Contrastive Predicative Coding
Semi-supervised learning (SSL) provides a powerful framework for leveraging
unlabeled data when labels are limited or expensive to obtain. SSL algorithms
based on deep neural networks have recently proven successful on standard
benchmark tasks. However, many of them have thus far been either inflexible,
inefficient or non-scalable. This paper explores recently developed contrastive
predictive coding technique to improve discriminative power of deep learning
models when a large portion of labels are absent. Two models, cpc-SSL and a
class conditional variant~(ccpc-SSL) are presented. They effectively exploit
the unlabeled data by extracting shared information between different parts of
the (high-dimensional) data. The proposed approaches are inductive, and scale
well to very large datasets like ImageNet, making them good candidates in
real-world large scale applications.Comment: 6 pages, 4 figures, conferenc
Adversarial Defense Framework for Graph Neural Network
Graph neural network (GNN), as a powerful representation learning model on
graph data, attracts much attention across various disciplines. However, recent
studies show that GNN is vulnerable to adversarial attacks. How to make GNN
more robust? What are the key vulnerabilities in GNN? How to address the
vulnerabilities and defense GNN against the adversarial attacks? In this paper,
we propose DefNet, an effective adversarial defense framework for GNNs. In
particular, we first investigate the latent vulnerabilities in every layer of
GNNs and propose corresponding strategies including dual-stage aggregation and
bottleneck perceptron. Then, to cope with the scarcity of training data, we
propose an adversarial contrastive learning method to train the GNN in a
conditional GAN manner by leveraging the high-level graph representation.
Extensive experiments on three public datasets demonstrate the effectiveness of
DefNet in improving the robustness of popular GNN variants, such as Graph
Convolutional Network and GraphSAGE, under various types of adversarial
attacks
Learning Determinantal Point Processes by Corrective Negative Sampling
Determinantal Point Processes (DPPs) have attracted significant interest from
the machine-learning community due to their ability to elegantly and tractably
model the delicate balance between quality and diversity of sets. DPPs are
commonly learned from data using maximum likelihood estimation (MLE). While
fitting observed sets well, MLE for DPPs may also assign high likelihoods to
unobserved sets that are far from the true generative distribution of the data.
To address this issue, which reduces the quality of the learned model, we
introduce a novel optimization problem, Contrastive Estimation (CE), which
encodes information about "negative" samples into the basic learning model. CE
is grounded in the successful use of negative information in machine-vision and
language modeling. Depending on the chosen negative distribution (which may be
static or evolve during optimization), CE assumes two different forms, which we
analyze theoretically and experimentally. We evaluate our new model on
real-world datasets; on a challenging dataset, CE learning delivers a
considerable improvement in predictive performance over a DPP learned without
using contrastive information.Comment: Will appear in AISTATS 201
COBRA: Contrastive Bi-Modal Representation Algorithm
There are a wide range of applications that involve multi-modal data, such as
cross-modal retrieval, visual question-answering, and image captioning. Such
applications are primarily dependent on aligned distributions of the different
constituent modalities. Existing approaches generate latent embeddings for each
modality in a joint fashion by representing them in a common manifold. However
these joint embedding spaces fail to sufficiently reduce the modality gap,
which affects the performance in downstream tasks. We hypothesize that these
embeddings retain the intra-class relationships but are unable to preserve the
inter-class dynamics. In this paper, we present a novel framework COBRA that
aims to train two modalities (image and text) in a joint fashion inspired by
the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE)
paradigms which preserve both inter and intra-class relationships. We
empirically show that this framework reduces the modality gap significantly and
generates a robust and task agnostic joint-embedding space. We outperform
existing work on four diverse downstream tasks spanning across seven benchmark
cross-modal datasets.Comment: 13 Pages, 6 Figures and 10 Table
Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE
We propose a number of new algorithms for learning deep energy models and
demonstrate their properties. We show that our SteinCD performs well in term of
test likelihood, while SteinGAN performs well in terms of generating realistic
looking images. Our results suggest promising directions for learning better
models by combining GAN-style methods with traditional energy-based learning
Approximate Inference with Amortised MCMC
We propose a novel approximate inference algorithm that approximates a target
distribution by amortising the dynamics of a user-selected MCMC sampler. The
idea is to initialise MCMC using samples from an approximation network, apply
the MCMC operator to improve these samples, and finally use the samples to
update the approximation network thereby improving its quality. This provides a
new generic framework for approximate inference, allowing us to deploy highly
complex, or implicitly defined approximation families with intractable
densities, including approximations produced by warping a source of randomness
through a deep neural network. Experiments consider image modelling with deep
generative models as a challenging test for the method. Deep models trained
using amortised MCMC are shown to generate realistic looking samples as well as
producing diverse imputations for images with regions of missing pixels
A Review of Learning with Deep Generative Models from Perspective of Graphical Modeling
This document aims to provide a review on learning with deep generative
models (DGMs), which is an highly-active area in machine learning and more
generally, artificial intelligence. This review is not meant to be a tutorial,
but when necessary, we provide self-contained derivations for completeness.
This review has two features. First, though there are different perspectives to
classify DGMs, we choose to organize this review from the perspective of
graphical modeling, because the learning methods for directed DGMs and
undirected DGMs are fundamentally different. Second, we differentiate model
definitions from model learning algorithms, since different learning algorithms
can be applied to solve the learning problem on the same model, and an
algorithm can be applied to learn different models. We thus separate model
definition and model learning, with more emphasis on reviewing, differentiating
and connecting different learning algorithms. We also discuss promising future
research directions.Comment: add SN-GANs, SA-GANs, conditional generation (cGANs, AC-GANs). arXiv
admin note: text overlap with arXiv:1606.00709, arXiv:1801.03558 by other
author
- …