97,424 research outputs found
Adversarial Out-domain Examples for Generative Models
Deep generative models are rapidly becoming a common tool for researchers and
developers. However, as exhaustively shown for the family of discriminative
models, the test-time inference of deep neural networks cannot be fully
controlled and erroneous behaviors can be induced by an attacker. In the
present work, we show how a malicious user can force a pre-trained generator to
reproduce arbitrary data instances by feeding it suitable adversarial inputs.
Moreover, we show that these adversarial latent vectors can be shaped so as to
be statistically indistinguishable from the set of genuine inputs. The proposed
attack technique is evaluated with respect to various GAN images generators
using different architectures, training processes and for both conditional and
not-conditional setups.Comment: accepted in proceedings of the Workshop on Machine Learning for
Cyber-Crime Investigation and Cybersecurit
Likelihood Ratios and Generative Classifiers for Unsupervised Out-of-Domain Detection In Task Oriented Dialog
The task of identifying out-of-domain (OOD) input examples directly at
test-time has seen renewed interest recently due to increased real world
deployment of models. In this work, we focus on OOD detection for natural
language sentence inputs to task-based dialog systems. Our findings are
three-fold: First, we curate and release ROSTD (Real Out-of-Domain Sentences
From Task-oriented Dialog) - a dataset of 4K OOD examples for the publicly
available dataset from (Schuster et al. 2019). In contrast to existing settings
which synthesize OOD examples by holding out a subset of classes, our examples
were authored by annotators with apriori instructions to be out-of-domain with
respect to the sentences in an existing dataset. Second, we explore likelihood
ratio based approaches as an alternative to currently prevalent paradigms.
Specifically, we reformulate and apply these approaches to natural language
inputs. We find that they match or outperform the latter on all datasets, with
larger improvements on non-artificial OOD benchmarks such as our dataset. Our
ablations validate that specifically using likelihood ratios rather than plain
likelihood is necessary to discriminate well between OOD and in-domain data.
Third, we propose learning a generative classifier and computing a marginal
likelihood (ratio) for OOD detection. This allows us to use a principled
likelihood while at the same time exploiting training-time labels. We find that
this approach outperforms both simple likelihood (ratio) based and other prior
approaches. We are hitherto the first to investigate the use of generative
classifiers for OOD detection at test-time.Comment: Accepted for AAAI-2020 Main Trac
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
A Very Brief Introduction to Machine Learning With Applications to Communication Systems
Given the unprecedented availability of data and computing resources, there
is widespread renewed interest in applying data-driven machine learning methods
to problems for which the development of conventional engineering solutions is
challenged by modelling or algorithmic deficiencies. This tutorial-style paper
starts by addressing the questions of why and when such techniques can be
useful. It then provides a high-level introduction to the basics of supervised
and unsupervised learning. For both supervised and unsupervised learning,
exemplifying applications to communication networks are discussed by
distinguishing tasks carried out at the edge and at the cloud segments of the
network at different layers of the protocol stack
NAM: Non-Adversarial Unsupervised Domain Mapping
Several methods were recently proposed for the task of translating images
between domains without prior knowledge in the form of correspondences. The
existing methods apply adversarial learning to ensure that the distribution of
the mapped source domain is indistinguishable from the target domain, which
suffers from known stability issues. In addition, most methods rely heavily on
`cycle' relationships between the domains, which enforce a one-to-one mapping.
In this work, we introduce an alternative method: Non-Adversarial Mapping
(NAM), which separates the task of target domain generative modeling from the
cross-domain mapping task. NAM relies on a pre-trained generative model of the
target domain, and aligns each source image with an image synthesized from the
target domain, while jointly optimizing the domain mapping function. It has
several key advantages: higher quality and resolution image translations,
simpler and more stable training and reusable target models. Extensive
experiments are presented validating the advantages of our method.Comment: ECCV 201
- …