8,518 research outputs found

    ABC random forests for Bayesian parameter inference

    Get PDF
    This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figure

    Learning and Designing Stochastic Processes from Logical Constraints

    Get PDF
    Stochastic processes offer a flexible mathematical formalism to model and reason about systems. Most analysis tools, however, start from the premises that models are fully specified, so that any parameters controlling the system's dynamics must be known exactly. As this is seldom the case, many methods have been devised over the last decade to infer (learn) such parameters from observations of the state of the system. In this paper, we depart from this approach by assuming that our observations are {\it qualitative} properties encoded as satisfaction of linear temporal logic formulae, as opposed to quantitative observations of the state of the system. An important feature of this approach is that it unifies naturally the system identification and the system design problems, where the properties, instead of observations, represent requirements to be satisfied. We develop a principled statistical estimation procedure based on maximising the likelihood of the system's parameters, using recent ideas from statistical machine learning. We demonstrate the efficacy and broad applicability of our method on a range of simple but non-trivial examples, including rumour spreading in social networks and hybrid models of gene regulation

    Regularizing Portfolio Optimization

    Get PDF
    The optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In this paper, we approach the problem from the point of view of statistical learning theory. The occurrence of the instability is intimately related to over-fitting which can be avoided using known regularization methods. We show how regularized portfolio optimization with the expected shortfall as a risk measure is related to support vector regression. The budget constraint dictates a modification. We present the resulting optimization problem and discuss the solution. The L2 norm of the weight vector is used as a regularizer, which corresponds to a diversification "pressure". This means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial because it improves the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework that enables the investor to trade-off between the two, depending on the size of the available data set

    Local Tomography of Large Networks under the Low-Observability Regime

    Full text link
    This article studies the problem of reconstructing the topology of a network of interacting agents via observations of the state-evolution of the agents. We focus on the large-scale network setting with the additional constraint of partialpartial observations, where only a small fraction of the agents can be feasibly observed. The goal is to infer the underlying subnetwork of interactions and we refer to this problem as locallocal tomographytomography. In order to study the large-scale setting, we adopt a proper stochastic formulation where the unobserved part of the network is modeled as an Erd\"{o}s-R\'enyi random graph, while the observable subnetwork is left arbitrary. The main result of this work is establishing that, under this setting, local tomography is actually possible with high probability, provided that certain conditions on the network model are met (such as stability and symmetry of the network combination matrix). Remarkably, such conclusion is established under the lowlow-observabilityobservability regimeregime, where the cardinality of the observable subnetwork is fixed, while the size of the overall network scales to infinity.Comment: To appear in IEEE Transactions on Information Theor

    Robust Singular Smoothers For Tracking Using Low-Fidelity Data

    Full text link
    Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smoothing with singular covariance, which covers bias and correlated noise, as well as many specific model types, such as those used in navigation. In particular, we show how to combine singular covariance models with robust losses and state-space constraints in a unified framework that can handle very low-fidelity data. A noisy, biased, and discretized navigation dataset from a submerged, low-cost inertial measurement unit (IMU) package, with ultra short baseline (USBL) data for ground truth, provides an opportunity to stress-test the proposed framework with promising results. We show how robust modeling elements improve our ability to analyze the data, and present batch processing results for 10 minutes of data with three different frequencies of available USBL position fixes (gaps of 30 seconds, 1 minute, and 2 minutes). The results suggest that the framework can be extended to real-time tracking using robust windowed estimation.Comment: 9 pages, 9 figures, to be included in Robotics: Science and Systems 201

    Information processing and signal integration in bacterial quorum sensing

    Get PDF
    Bacteria communicate using secreted chemical signaling molecules called autoinducers in a process known as quorum sensing. The quorum-sensing network of the marine bacterium {\it Vibrio harveyi} employs three autoinducers, each known to encode distinct ecological information. Yet how cells integrate and interpret the information contained within the three autoinducer signals remains a mystery. Here, we develop a new framework for analyzing signal integration based on Information Theory and use it to analyze quorum sensing in {\it V. harveyi}. We quantify how much the cells can learn about individual autoinducers and explain the experimentally observed input-output relation of the {\it V. harveyi} quorum-sensing circuit. Our results suggest that the need to limit interference between input signals places strong constraints on the architecture of bacterial signal-integration networks, and that bacteria likely have evolved active strategies for minimizing this interference. Here we analyze two such strategies: manipulation of autoinducer production and feedback on receptor number ratios.Comment: Supporting information is in appendi

    Validating Predictions of Unobserved Quantities

    Full text link
    The ultimate purpose of most computational models is to make predictions, commonly in support of some decision-making process (e.g., for design or operation of some system). The quantities that need to be predicted (the quantities of interest or QoIs) are generally not experimentally observable before the prediction, since otherwise no prediction would be needed. Assessing the validity of such extrapolative predictions, which is critical to informed decision-making, is challenging. In classical approaches to validation, model outputs for observed quantities are compared to observations to determine if they are consistent. By itself, this consistency only ensures that the model can predict the observed quantities under the conditions of the observations. This limitation dramatically reduces the utility of the validation effort for decision making because it implies nothing about predictions of unobserved QoIs or for scenarios outside of the range of observations. However, there is no agreement in the scientific community today regarding best practices for validation of extrapolative predictions made using computational models. The purpose of this paper is to propose and explore a validation and predictive assessment process that supports extrapolative predictions for models with known sources of error. The process includes stochastic modeling, calibration, validation, and predictive assessment phases where representations of known sources of uncertainty and error are built, informed, and tested. The proposed methodology is applied to an illustrative extrapolation problem involving a misspecified nonlinear oscillator

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
    • 

    corecore