21,088 research outputs found
Using Synthetic Data to Train Neural Networks is Model-Based Reasoning
We draw a formal connection between using synthetic training data to optimize
neural network parameters and approximate, Bayesian, model-based reasoning. In
particular, training a neural network using synthetic data can be viewed as
learning a proposal distribution generator for approximate inference in the
synthetic-data generative model. We demonstrate this connection in a
recognition task where we develop a novel Captcha-breaking architecture and
train it using synthetic data, demonstrating both state-of-the-art performance
and a way of computing task-specific posterior uncertainty. Using a neural
network trained this way, we also demonstrate successful breaking of real-world
Captchas currently used by Facebook and Wikipedia. Reasoning from these
empirical results and drawing connections with Bayesian modeling, we discuss
the robustness of synthetic data results and suggest important considerations
for ensuring good neural network generalization when training with synthetic
data.Comment: 8 pages, 4 figure
- …