5,403 research outputs found
Adversarial Structured Prediction for Multivariate Measures
Many predicted structured objects (e.g., sequences, matchings, trees) are
evaluated using the F-score, alignment error rate (AER), or other multivariate
performance measures. Since inductively optimizing these measures using
training data is typically computationally difficult, empirical risk
minimization of surrogate losses is employed, using, e.g., the hinge loss for
(structured) support vector machines. These approximations often introduce a
mismatch between the learner's objective and the desired application
performance, leading to inconsistency. We take a different approach:
adversarially approximate training data while optimizing the exact F-score or
AER. Structured predictions under this formulation result from solving zero-sum
games between a predictor seeking the best performance and an adversary seeking
the worst while required to (approximately) match certain structured properties
of the training data. We explore this approach for word alignment (AER
evaluation) and named entity recognition (F-score evaluation) with linear-chain
constraints
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
We approach structured output prediction by optimizing a deep value network
(DVN) to precisely estimate the task loss on different output configurations
for a given input. Once the model is trained, we perform inference by gradient
descent on the continuous relaxations of the output variables to find outputs
with promising scores from the value network. When applied to image
segmentation, the value network takes an image and a segmentation mask as
inputs and predicts a scalar estimating the intersection over union between the
input and ground truth masks. For multi-label classification, the DVN's
objective is to correctly predict the F1 score for any potential label
configuration. The DVN framework achieves the state-of-the-art results on
multi-label prediction and image segmentation benchmarks.Comment: Published at ICML 201
Adversarial Phenomenon in the Eyes of Bayesian Deep Learning
Deep Learning models are vulnerable to adversarial examples, i.e.\ images
obtained via deliberate imperceptible perturbations, such that the model
misclassifies them with high confidence. However, class confidence by itself is
an incomplete picture of uncertainty. We therefore use principled Bayesian
methods to capture model uncertainty in prediction for observing adversarial
misclassification. We provide an extensive study with different Bayesian neural
networks attacked in both white-box and black-box setups. The behaviour of the
networks for noise, attacks and clean test data is compared. We observe that
Bayesian neural networks are uncertain in their predictions for adversarial
perturbations, a behaviour similar to the one observed for random Gaussian
perturbations. Thus, we conclude that Bayesian neural networks can be
considered for detecting adversarial examples.Comment: 13 pages, 7 figure
Augmenting correlation structures in spatial data using deep generative models
State-of-the-art deep learning methods have shown a remarkable capacity to
model complex data domains, but struggle with geospatial data. In this paper,
we introduce SpaceGAN, a novel generative model for geospatial domains that
learns neighbourhood structures through spatial conditioning. We propose to
enhance spatial representation beyond mere spatial coordinates, by conditioning
each data point on feature vectors of its spatial neighbours, thus allowing for
a more flexible representation of the spatial structure. To overcome issues of
training convergence, we employ a metric capturing the loss in local spatial
autocorrelation between real and generated data as stopping criterion for
SpaceGAN parametrization. This way, we ensure that the generator produces
synthetic samples faithful to the spatial patterns observed in the input.
SpaceGAN is successfully applied for data augmentation and outperforms compared
to other methods of synthetic spatial data generation. Finally, we propose an
ensemble learning framework for the geospatial domain, taking augmented
SpaceGAN samples as training data for a set of ensemble learners. We
empirically show the superiority of this approach over conventional ensemble
learning approaches and rivaling spatial data augmentation methods, using
synthetic and real-world prediction tasks. Our findings suggest that SpaceGAN
can be used as a tool for (1) artificially inflating sparse geospatial data and
(2) improving generalization of geospatial models
Identify Susceptible Locations in Medical Records via Adversarial Attacks on Deep Predictive Models
The surging availability of electronic medical records (EHR) leads to
increased research interests in medical predictive modeling. Recently many deep
learning based predicted models are also developed for EHR data and
demonstrated impressive performance. However, a series of recent studies showed
that these deep models are not safe: they suffer from certain vulnerabilities.
In short, a well-trained deep network can be extremely sensitive to inputs with
negligible changes. These inputs are referred to as adversarial examples. In
the context of medical informatics, such attacks could alter the result of a
high performance deep predictive model by slightly perturbing a patient's
medical records. Such instability not only reflects the weakness of deep
architectures, more importantly, it offers guide on detecting susceptible parts
on the inputs. In this paper, we propose an efficient and effective framework
that learns a time-preferential minimum attack targeting the LSTM model with
EHR inputs, and we leverage this attack strategy to screen medical records of
patients and identify susceptible events and measurements. The efficient
screening procedure can assist decision makers to pay extra attentions to the
locations that can cause severe consequence if not measured correctly. We
conduct extensive empirical studies on a real-world urgent care cohort and
demonstrate the effectiveness of the proposed screening approach
Parametric Adversarial Divergences are Good Task Losses for Generative Modeling
Generative modeling of high dimensional data like images is a notoriously
difficult and ill-defined problem. In particular, how to evaluate a learned
generative model is unclear. In this position paper, we argue that adversarial
learning, pioneered with generative adversarial networks (GANs), provides an
interesting framework to implicitly define more meaningful task losses for
generative modeling tasks, such as for generating "visually realistic" images.
We refer to those task losses as parametric adversarial divergences and we give
two main reasons why we think parametric divergences are good learning
objectives for generative modeling. Additionally, we unify the processes of
choosing a good structured loss (in structured prediction) and choosing a
discriminator architecture (in generative modeling) using statistical decision
theory; we are then able to formalize and quantify the intuition that "weaker"
losses are easier to learn from, in a specific setting. Finally, we propose two
new challenging tasks to evaluate parametric and nonparametric divergences: a
qualitative task of generating very high-resolution digits, and a quantitative
task of learning data that satisfies high-level algebraic constraints. We use
two common divergences to train a generator and show that the parametric
divergence outperforms the nonparametric divergence on both the qualitative and
the quantitative task.Comment: 22 page
Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance
Large amounts of labeled data are typically required to train deep learning
models. For many real-world problems, however, acquiring additional data can be
expensive or even impossible. We present semi-supervised deep kernel learning
(SSDKL), a semi-supervised regression model based on minimizing predictive
variance in the posterior regularization framework. SSDKL combines the
hierarchical representation learning of neural networks with the probabilistic
modeling capabilities of Gaussian processes. By leveraging unlabeled data, we
show improvements on a diverse set of real-world regression tasks over
supervised deep kernel learning and semi-supervised methods such as VAT and
mean teacher adapted for regression.Comment: In Proceedings of Neural Information Processing Systems (NeurIPS)
201
Recent Advances in Autoencoder-Based Representation Learning
Learning useful representations with little or no supervision is a key
challenge in artificial intelligence. We provide an in-depth review of recent
advances in representation learning with a focus on autoencoder-based models.
To organize these results we make use of meta-priors believed useful for
downstream tasks, such as disentanglement and hierarchical organization of
features. In particular, we uncover three main mechanisms to enforce such
properties, namely (i) regularizing the (approximate or aggregate) posterior
distribution, (ii) factorizing the encoding and decoding distribution, or (iii)
introducing a structured prior distribution. While there are some promising
results, implicit or explicit supervision remains a key enabler and all current
methods use strong inductive biases and modeling assumptions. Finally, we
provide an analysis of autoencoder-based representation learning through the
lens of rate-distortion theory and identify a clear tradeoff between the amount
of prior knowledge available about the downstream tasks, and how useful the
representation is for this task.Comment: Presented at the third workshop on Bayesian Deep Learning (NeurIPS
2018
Probabilistic Video Generation using Holistic Attribute Control
Videos express highly structured spatio-temporal patterns of visual data. A
video can be thought of as being governed by two factors: (i) temporally
invariant (e.g., person identity), or slowly varying (e.g., activity),
attribute-induced appearance, encoding the persistent content of each frame,
and (ii) an inter-frame motion or scene dynamics (e.g., encoding evolution of
the person ex-ecuting the action). Based on this intuition, we propose a
generative framework for video generation and future prediction. The proposed
framework generates a video (short clip) by decoding samples sequentially drawn
from a latent space distribution into full video frames. Variational
Autoencoders (VAEs) are used as a means of encoding/decoding frames into/from
the latent space and RNN as a wayto model the dynamics in the latent space. We
improve the video generation consistency through temporally-conditional
sampling and quality by structuring the latent space with attribute controls;
ensuring that attributes can be both inferred and conditioned on during
learning/generation. As a result, given attributes and/orthe first frame, our
model is able to generate diverse but highly consistent sets ofvideo sequences,
accounting for the inherent uncertainty in the prediction task. Experimental
results on Chair CAD, Weizmann Human Action, and MIT-Flickr datasets, along
with detailed comparison to the state-of-the-art, verify effectiveness of the
framework
Generative Imputation and Stochastic Prediction
In many machine learning applications, we are faced with incomplete datasets.
In the literature, missing data imputation techniques have been mostly
concerned with filling missing values. However, the existence of missing values
is synonymous with uncertainties not only over the distribution of missing
values but also over target class assignments that require careful
consideration. In this paper, we propose a simple and effective method for
imputing missing features and estimating the distribution of target assignments
given incomplete data. In order to make imputations, we train a simple and
effective generator network to generate imputations that a discriminator
network is tasked to distinguish. Following this, a predictor network is
trained using the imputed samples from the generator network to capture the
classification uncertainties and make predictions accordingly. The proposed
method is evaluated on CIFAR-10 and MNIST image datasets as well as five
real-world tabular classification datasets, under different missingness rates
and structures. Our experimental results show the effectiveness of the proposed
method in generating imputations as well as providing estimates for the class
uncertainties in a classification task when faced with missing values
- …