4,934 research outputs found
HoloDetect: Few-Shot Learning for Error Detection
We introduce a few-shot learning framework for error detection. We show that
data augmentation (a form of weak supervision) is key to training high-quality,
ML-based error detection models that require minimal human involvement. Our
framework consists of two parts: (1) an expressive model to learn rich
representations that capture the inherent syntactic and semantic heterogeneity
of errors; and (2) a data augmentation model that, given a small seed of clean
records, uses dataset-specific transformations to automatically generate
additional training data. Our key insight is to learn data augmentation
policies from the noisy input dataset in a weakly supervised manner. We show
that our framework detects errors with an average precision of ~94% and an
average recall of ~93% across a diverse array of datasets that exhibit
different types and amounts of errors. We compare our approach to a
comprehensive collection of error detection methods, ranging from traditional
rule-based methods to ensemble-based and active learning approaches. We show
that data augmentation yields an average improvement of 20 F1 points while it
requires access to 3x fewer labeled examples compared to other ML approaches.Comment: 18 pages
A Very Brief Introduction to Machine Learning With Applications to Communication Systems
Given the unprecedented availability of data and computing resources, there
is widespread renewed interest in applying data-driven machine learning methods
to problems for which the development of conventional engineering solutions is
challenged by modelling or algorithmic deficiencies. This tutorial-style paper
starts by addressing the questions of why and when such techniques can be
useful. It then provides a high-level introduction to the basics of supervised
and unsupervised learning. For both supervised and unsupervised learning,
exemplifying applications to communication networks are discussed by
distinguishing tasks carried out at the edge and at the cloud segments of the
network at different layers of the protocol stack
Learning Privacy Preserving Encodings through Adversarial Training
We present a framework to learn privacy-preserving encodings of images that
inhibit inference of chosen private attributes, while allowing recovery of
other desirable information. Rather than simply inhibiting a given fixed
pre-trained estimator, our goal is that an estimator be unable to learn to
accurately predict the private attributes even with knowledge of the encoding
function. We use a natural adversarial optimization-based formulation for
this---training the encoding function against a classifier for the private
attribute, with both modeled as deep neural networks. The key contribution of
our work is a stable and convergent optimization approach that is successful at
learning an encoder with our desired properties---maintaining utility while
inhibiting inference of private attributes, not just within the adversarial
optimization, but also by classifiers that are trained after the encoder is
fixed. We adopt a rigorous experimental protocol for verification wherein
classifiers are trained exhaustively till saturation on the fixed encoders. We
evaluate our approach on tasks of real-world complexity---learning
high-dimensional encodings that inhibit detection of different scene
categories---and find that it yields encoders that are resilient at maintaining
privacy.Comment: To appear in WACV 201
- …