63,522 research outputs found
Using Synthetic Data to Train Neural Networks is Model-Based Reasoning
We draw a formal connection between using synthetic training data to optimize
neural network parameters and approximate, Bayesian, model-based reasoning. In
particular, training a neural network using synthetic data can be viewed as
learning a proposal distribution generator for approximate inference in the
synthetic-data generative model. We demonstrate this connection in a
recognition task where we develop a novel Captcha-breaking architecture and
train it using synthetic data, demonstrating both state-of-the-art performance
and a way of computing task-specific posterior uncertainty. Using a neural
network trained this way, we also demonstrate successful breaking of real-world
Captchas currently used by Facebook and Wikipedia. Reasoning from these
empirical results and drawing connections with Bayesian modeling, we discuss
the robustness of synthetic data results and suggest important considerations
for ensuring good neural network generalization when training with synthetic
data.Comment: 8 pages, 4 figure
Bayesian Updating, Model Class Selection and Robust Stochastic Predictions of Structural Response
A fundamental issue when predicting structural response by using mathematical models is how to treat both modeling and excitation uncertainty. A general framework for this is presented which uses probability as a multi-valued
conditional logic for quantitative plausible reasoning in the presence of uncertainty due to incomplete information. The
fundamental probability models that represent the structure’s uncertain behavior are specified by the choice of a stochastic
system model class: a set of input-output probability models for the structure and a prior probability distribution over this set
that quantifies the relative plausibility of each model. A model class can be constructed from a parameterized deterministic
structural model by stochastic embedding utilizing Jaynes’ Principle of Maximum Information Entropy. Robust predictive
analyses use the entire model class with the probabilistic predictions of each model being weighted by its prior probability, or if
structural response data is available, by its posterior probability from Bayes’ Theorem for the model class. Additional robustness
to modeling uncertainty comes from combining the robust predictions of each model class in a set of competing candidates
weighted by the prior or posterior probability of the model class, the latter being computed from Bayes’ Theorem. This higherlevel application of Bayes’ Theorem automatically applies a quantitative Ockham razor that penalizes the data-fit of more
complex model classes that extract more information from the data. Robust predictive analyses involve integrals over highdimensional spaces that usually must be evaluated numerically. Published applications have used Laplace's method of
asymptotic approximation or Markov Chain Monte Carlo algorithms
t-Exponential Memory Networks for Question-Answering Machines
Recent advances in deep learning have brought to the fore models that can
make multiple computational steps in the service of completing a task; these
are capable of describ- ing long-term dependencies in sequential data. Novel
recurrent attention models over possibly large external memory modules
constitute the core mechanisms that enable these capabilities. Our work
addresses learning subtler and more complex underlying temporal dynamics in
language modeling tasks that deal with sparse sequential data. To this end, we
improve upon these recent advances, by adopting concepts from the field of
Bayesian statistics, namely variational inference. Our proposed approach
consists in treating the network parameters as latent variables with a prior
distribution imposed over them. Our statistical assumptions go beyond the
standard practice of postulating Gaussian priors. Indeed, to allow for handling
outliers, which are prevalent in long observed sequences of multivariate data,
multivariate t-exponential distributions are imposed. On this basis, we proceed
to infer corresponding posteriors; these can be used for inference and
prediction at test time, in a way that accounts for the uncertainty in the
available sparse training data. Specifically, to allow for our approach to best
exploit the merits of the t-exponential family, our method considers a new
t-divergence measure, which generalizes the concept of the Kullback-Leibler
divergence. We perform an extensive experimental evaluation of our approach,
using challenging language modeling benchmarks, and illustrate its superiority
over existing state-of-the-art techniques
- …