Search CORE

8,518 research outputs found

ABC random forests for Bayesian parameter inference

Author: Estoup Arnaud
Marin Jean-Michel
Pudlo Pierre
Raynal Louis
Ribatet Mathieu
Robert Christian P.
Publication venue: 'Peer Community In'
Publication date: 02/11/2018
Field of study

This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figure

arXiv.org e-Print Archive

HAL AMU

INRIA a CCSD electronic archive server

HAL Descartes

Warwick Research Archives Portal Repository

HAL-CIRAD

Learning and Designing Stochastic Processes from Logical Constraints

Author: Bortolussi Luca
Sanguinetti Guido
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 01/01/2015
Field of study

Stochastic processes offer a flexible mathematical formalism to model and reason about systems. Most analysis tools, however, start from the premises that models are fully specified, so that any parameters controlling the system's dynamics must be known exactly. As this is seldom the case, many methods have been devised over the last decade to infer (learn) such parameters from observations of the state of the system. In this paper, we depart from this approach by assuming that our observations are {\it qualitative} properties encoded as satisfaction of linear temporal logic formulae, as opposed to quantitative observations of the state of the system. An important feature of this approach is that it unifies naturally the system identification and the system design problems, where the properties, instead of observations, represent requirements to be satisfied. We develop a principled statistical estimation procedure based on maximising the likelihood of the system's parameters, using recent ideas from statistical machine learning. We demonstrate the efficacy and broad applicability of our method on a range of simple but non-trivial examples, including rumour spreading in social networks and hybrid models of gene regulation

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Regularizing Portfolio Optimization

Author: Acerbi C
Acerbi C Nordio C Sirtori C
Bengio Y
Bertsekas D P
Bordes A
Bottou L
Bouchaud J-Ph
Burda Z
Chopra V K
DeMiguel V
Elton E J
Embrechts P
Frahm G
Frahm G Memmel Ch
Gulyas N Kondor I
Imre Kondor
Jobson J D
Jorion P
Kempf A
Kondor I Varga-Haszonits I
Macrae R
Markowitz H
Morgan J P Reuters Riskmetrics
Perez-Cruz F
Potters M
Rockafellar R T
Schölkopf B
Schölkopf B
Susanne Still
Tibshirani R
Vanderbei R J
Vapnik V
Vapnik V
Vapnik V
Varga-Haszonits I
Publication venue: 'IOP Publishing'
Publication date: 09/11/2009
Field of study

The optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In this paper, we approach the problem from the point of view of statistical learning theory. The occurrence of the instability is intimately related to over-fitting which can be avoided using known regularization methods. We show how regularized portfolio optimization with the expected shortfall as a risk measure is related to support vector regression. The budget constraint dictates a modification. We present the resulting optimization problem and discuss the solution. The L2 norm of the weight vector is used as a regularizer, which corresponds to a diversification "pressure". This means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial because it improves the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework that enables the investor to trade-off between the two, depending on the size of the available data set

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

Recommended from our members

The Credit Problem in parametric stress: A probabilistic approach

Author: Jarosz Gaja
Nazarov Aleksei Ioulevitch
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

In this paper, we introduce a novel domain-general, statistical learning model for P&P grammars: the Expectation Driven Parameter Learner (EDPL). We show that the EDPL provides a mathematically principled solution to the Credit Problem (Dresher 1999). We present the first systematic tests of the EDPL and an existing and closely related model, the Naïve Parameter Learner (NPL), on a full stress typology, the one generated by Dresher & Kaye’s (1990) stress parameter framework. This framework has figured prominently in the debate about the necessity of domain-specific mechanisms for learning of parametric stress. The essential difference between the two learning models is that the EDPL incorporates a mechanism that directly tackles the Credit Problem, while the NPL does not. We find that the NPL fails to cope with the ambiguity of this stress system both in terms of learning success and data complexity, while the EDPL performs well on both metrics. Based on these results, we argue that probabilistic inference provides a viable domain-general approach to parametric stress learning, but only when learning involves an inferential process that directly addresses the Credit Problem. We also present in-depth analyses of the learning outcomes, showing how learning outcomes depend crucially on the structural ambiguities posited by a particular phonological theory, and how these learning difficulties correspond to typological gaps

ScholarWorks@UMass Amherst

Utrecht University Repository

Local Tomography of Large Networks under the Low-Observability Regime

Author: Matta Vincenzo
Santos Augusto
Sayed Ali H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/10/2019
Field of study

This article studies the problem of reconstructing the topology of a network of interacting agents via observations of the state-evolution of the agents. We focus on the large-scale network setting with the additional constraint of

partial

observations, where only a small fraction of the agents can be feasibly observed. The goal is to infer the underlying subnetwork of interactions and we refer to this problem as

local

tomography

. In order to study the large-scale setting, we adopt a proper stochastic formulation where the unobserved part of the network is modeled as an Erd\"{o}s-R\'enyi random graph, while the observable subnetwork is left arbitrary. The main result of this work is establishing that, under this setting, local tomography is actually possible with high probability, provided that certain conditions on the network model are met (such as stability and symmetry of the network combination matrix). Remarkably, such conclusion is established under the

low

observability

regime

, where the cardinality of the observable subnetwork is fixed, while the size of the overall network scales to infinity.Comment: To appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Robust Singular Smoothers For Tracking Using Low-Fidelity Data

Author: Aravkin Aleksandr
Burke James V.
Jonker Jonathan
Pillonetto Gianluigi
Webster Sarah
Publication venue
Publication date: 01/01/2019
Field of study

Tracking underwater autonomous platforms is often difficult because of noisy, biased, and discretized input data. Classic filters and smoothers based on standard assumptions of Gaussian white noise break down when presented with any of these challenges. Robust models (such as the Huber loss) and constraints (e.g. maximum velocity) are used to attenuate these issues. Here, we consider robust smoothing with singular covariance, which covers bias and correlated noise, as well as many specific model types, such as those used in navigation. In particular, we show how to combine singular covariance models with robust losses and state-space constraints in a unified framework that can handle very low-fidelity data. A noisy, biased, and discretized navigation dataset from a submerged, low-cost inertial measurement unit (IMU) package, with ultra short baseline (USBL) data for ground truth, provides an opportunity to stress-test the proposed framework with promising results. We show how robust modeling elements improve our ability to analyze the data, and present batch processing results for 10 minutes of data with three different frequencies of available USBL position fixes (gaps of 30 seconds, 1 minute, and 2 minutes). The results suggest that the framework can be extended to real-time tracking using robust windowed estimation.Comment: 9 pages, 9 figures, to be included in Robotics: Science and Systems 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Information processing and signal integration in bacterial quorum sensing

Author: Bonnie L Bassler
MacKay D
Ned S Wingreen
Nieuwkoop PD
Pankaj Mehta
Rieke F
Sidhartha Goyal
Tao Long
Publication venue
Publication date: 25/05/2009
Field of study

Bacteria communicate using secreted chemical signaling molecules called autoinducers in a process known as quorum sensing. The quorum-sensing network of the marine bacterium {\it Vibrio harveyi} employs three autoinducers, each known to encode distinct ecological information. Yet how cells integrate and interpret the information contained within the three autoinducer signals remains a mystery. Here, we develop a new framework for analyzing signal integration based on Information Theory and use it to analyze quorum sensing in {\it V. harveyi}. We quantify how much the cells can learn about individual autoinducers and explain the experimentally observed input-output relation of the {\it V. harveyi} quorum-sensing circuit. Our results suggest that the need to limit interference between input signals places strong constraints on the architecture of bacterial signal-integration networks, and that bacteria likely have evolved active strategies for minimizing this interference. Here we analyze two such strategies: manipulation of autoinducer production and feedback on receptor number ratios.Comment: Supporting information is in appendi

arXiv.org e-Print Archive

Crossref

PubMed Central

Validating Predictions of Unobserved Quantities

Author: Moser Robert D.
Oliver Todd A.
Simmons Christopher S.
Terejanu Gabriel
Publication venue: 'Elsevier BV'
Publication date: 29/04/2014
Field of study

The ultimate purpose of most computational models is to make predictions, commonly in support of some decision-making process (e.g., for design or operation of some system). The quantities that need to be predicted (the quantities of interest or QoIs) are generally not experimentally observable before the prediction, since otherwise no prediction would be needed. Assessing the validity of such extrapolative predictions, which is critical to informed decision-making, is challenging. In classical approaches to validation, model outputs for observed quantities are compared to observations to determine if they are consistent. By itself, this consistency only ensures that the model can predict the observed quantities under the conditions of the observations. This limitation dramatically reduces the utility of the validation effort for decision making because it implies nothing about predictions of unobserved QoIs or for scenarios outside of the range of observations. However, there is no agreement in the scientific community today regarding best practices for validation of extrapolative predictions made using computational models. The purpose of this paper is to propose and explore a validation and predictive assessment process that supports extrapolative predictions for models with known sources of error. The process includes stochastic modeling, calibration, validation, and predictive assessment phases where representations of known sources of uncertainty and error are built, informed, and tested. The proposed methodology is applied to an illustrative extrapolation problem involving a misspecified nonlinear oscillator

arXiv.org e-Print Archive

CiteSeerX

Reliable ABC model choice via random forests

Author: Cornuet Jean-Marie
Estoup Arnaud
Gautier Mathieu
Marin Jean-Michel
Pudlo Pierre
Robert Christian P.
Publication venue
Publication date: 02/09/2015
Field of study

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

HAL Descartes

Warwick Research Archives Portal Repository

HAL-CIRAD