63,522 research outputs found

    Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

    Full text link
    We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.Comment: 8 pages, 4 figure

    Bayesian Updating, Model Class Selection and Robust Stochastic Predictions of Structural Response

    Get PDF
    A fundamental issue when predicting structural response by using mathematical models is how to treat both modeling and excitation uncertainty. A general framework for this is presented which uses probability as a multi-valued conditional logic for quantitative plausible reasoning in the presence of uncertainty due to incomplete information. The fundamental probability models that represent the structure’s uncertain behavior are specified by the choice of a stochastic system model class: a set of input-output probability models for the structure and a prior probability distribution over this set that quantifies the relative plausibility of each model. A model class can be constructed from a parameterized deterministic structural model by stochastic embedding utilizing Jaynes’ Principle of Maximum Information Entropy. Robust predictive analyses use the entire model class with the probabilistic predictions of each model being weighted by its prior probability, or if structural response data is available, by its posterior probability from Bayes’ Theorem for the model class. Additional robustness to modeling uncertainty comes from combining the robust predictions of each model class in a set of competing candidates weighted by the prior or posterior probability of the model class, the latter being computed from Bayes’ Theorem. This higherlevel application of Bayes’ Theorem automatically applies a quantitative Ockham razor that penalizes the data-fit of more complex model classes that extract more information from the data. Robust predictive analyses involve integrals over highdimensional spaces that usually must be evaluated numerically. Published applications have used Laplace's method of asymptotic approximation or Markov Chain Monte Carlo algorithms

    t-Exponential Memory Networks for Question-Answering Machines

    Full text link
    Recent advances in deep learning have brought to the fore models that can make multiple computational steps in the service of completing a task; these are capable of describ- ing long-term dependencies in sequential data. Novel recurrent attention models over possibly large external memory modules constitute the core mechanisms that enable these capabilities. Our work addresses learning subtler and more complex underlying temporal dynamics in language modeling tasks that deal with sparse sequential data. To this end, we improve upon these recent advances, by adopting concepts from the field of Bayesian statistics, namely variational inference. Our proposed approach consists in treating the network parameters as latent variables with a prior distribution imposed over them. Our statistical assumptions go beyond the standard practice of postulating Gaussian priors. Indeed, to allow for handling outliers, which are prevalent in long observed sequences of multivariate data, multivariate t-exponential distributions are imposed. On this basis, we proceed to infer corresponding posteriors; these can be used for inference and prediction at test time, in a way that accounts for the uncertainty in the available sparse training data. Specifically, to allow for our approach to best exploit the merits of the t-exponential family, our method considers a new t-divergence measure, which generalizes the concept of the Kullback-Leibler divergence. We perform an extensive experimental evaluation of our approach, using challenging language modeling benchmarks, and illustrate its superiority over existing state-of-the-art techniques
    corecore