8,518 research outputs found
ABC random forests for Bayesian parameter inference
This preprint has been reviewed and recommended by Peer Community In
Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036).
Approximate Bayesian computation (ABC) has grown into a standard methodology
that manages Bayesian inference for models associated with intractable
likelihood functions. Most ABC implementations require the preliminary
selection of a vector of informative statistics summarizing raw data.
Furthermore, in almost all existing implementations, the tolerance level that
separates acceptance from rejection of simulated parameter values needs to be
calibrated. We propose to conduct likelihood-free Bayesian inferences about
parameters with no prior selection of the relevant components of the summary
statistics and bypassing the derivation of the associated tolerance level. The
approach relies on the random forest methodology of Breiman (2001) applied in a
(non parametric) regression setting. We advocate the derivation of a new random
forest for each component of the parameter vector of interest. When compared
with earlier ABC solutions, this method offers significant gains in terms of
robustness to the choice of the summary statistics, does not depend on any type
of tolerance level, and is a good trade-off in term of quality of point
estimator precision and credible interval estimations for a given computing
time. We illustrate the performance of our methodological proposal and compare
it with earlier ABC methods on a Normal toy example and a population genetics
example dealing with human population evolution. All methods designed here have
been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5
figure
Learning and Designing Stochastic Processes from Logical Constraints
Stochastic processes offer a flexible mathematical formalism to model and
reason about systems. Most analysis tools, however, start from the premises
that models are fully specified, so that any parameters controlling the
system's dynamics must be known exactly. As this is seldom the case, many
methods have been devised over the last decade to infer (learn) such parameters
from observations of the state of the system. In this paper, we depart from
this approach by assuming that our observations are {\it qualitative}
properties encoded as satisfaction of linear temporal logic formulae, as
opposed to quantitative observations of the state of the system. An important
feature of this approach is that it unifies naturally the system identification
and the system design problems, where the properties, instead of observations,
represent requirements to be satisfied. We develop a principled statistical
estimation procedure based on maximising the likelihood of the system's
parameters, using recent ideas from statistical machine learning. We
demonstrate the efficacy and broad applicability of our method on a range of
simple but non-trivial examples, including rumour spreading in social networks
and hybrid models of gene regulation
Regularizing Portfolio Optimization
The optimization of large portfolios displays an inherent instability to
estimation error. This poses a fundamental problem, because solutions that are
not stable under sample fluctuations may look optimal for a given sample, but
are, in effect, very far from optimal with respect to the average risk. In this
paper, we approach the problem from the point of view of statistical learning
theory. The occurrence of the instability is intimately related to over-fitting
which can be avoided using known regularization methods. We show how
regularized portfolio optimization with the expected shortfall as a risk
measure is related to support vector regression. The budget constraint dictates
a modification. We present the resulting optimization problem and discuss the
solution. The L2 norm of the weight vector is used as a regularizer, which
corresponds to a diversification "pressure". This means that diversification,
besides counteracting downward fluctuations in some assets by upward
fluctuations in others, is also crucial because it improves the stability of
the solution. The approach we provide here allows for the simultaneous
treatment of optimization and diversification in one framework that enables the
investor to trade-off between the two, depending on the size of the available
data set
Recommended from our members
The Credit Problem in parametric stress: A probabilistic approach
In this paper, we introduce a novel domain-general, statistical learning model for P&P grammars: the Expectation Driven Parameter Learner (EDPL). We show that the EDPL provides a mathematically principled solution to the Credit Problem (Dresher 1999). We present the first systematic tests of the EDPL and an existing and closely related model, the NaĂŻve Parameter Learner (NPL), on a full stress typology, the one generated by Dresher & Kayeâs (1990) stress parameter framework. This framework has figured prominently in the debate about the necessity of domain-specific mechanisms for learning of parametric stress. The essential difference between the two learning models is that the EDPL incorporates a mechanism that directly tackles the Credit Problem, while the NPL does not. We find that the NPL fails to cope with the ambiguity of this stress system both in terms of learning success and data complexity, while the EDPL performs well on both metrics. Based on these results, we argue that probabilistic inference provides a viable domain-general approach to parametric stress learning, but only when learning involves an inferential process that directly addresses the Credit Problem. We also present in-depth analyses of the learning outcomes, showing how learning outcomes depend crucially on the structural ambiguities posited by a particular phonological theory, and how these learning difficulties correspond to typological gaps
Local Tomography of Large Networks under the Low-Observability Regime
This article studies the problem of reconstructing the topology of a network
of interacting agents via observations of the state-evolution of the agents. We
focus on the large-scale network setting with the additional constraint of
observations, where only a small fraction of the agents can be
feasibly observed. The goal is to infer the underlying subnetwork of
interactions and we refer to this problem as . In order to
study the large-scale setting, we adopt a proper stochastic formulation where
the unobserved part of the network is modeled as an Erd\"{o}s-R\'enyi random
graph, while the observable subnetwork is left arbitrary. The main result of
this work is establishing that, under this setting, local tomography is
actually possible with high probability, provided that certain conditions on
the network model are met (such as stability and symmetry of the network
combination matrix). Remarkably, such conclusion is established under the
- , where the cardinality of the observable
subnetwork is fixed, while the size of the overall network scales to infinity.Comment: To appear in IEEE Transactions on Information Theor
Robust Singular Smoothers For Tracking Using Low-Fidelity Data
Tracking underwater autonomous platforms is often difficult because of noisy,
biased, and discretized input data. Classic filters and smoothers based on
standard assumptions of Gaussian white noise break down when presented with any
of these challenges. Robust models (such as the Huber loss) and constraints
(e.g. maximum velocity) are used to attenuate these issues. Here, we consider
robust smoothing with singular covariance, which covers bias and correlated
noise, as well as many specific model types, such as those used in navigation.
In particular, we show how to combine singular covariance models with robust
losses and state-space constraints in a unified framework that can handle very
low-fidelity data. A noisy, biased, and discretized navigation dataset from a
submerged, low-cost inertial measurement unit (IMU) package, with ultra short
baseline (USBL) data for ground truth, provides an opportunity to stress-test
the proposed framework with promising results. We show how robust modeling
elements improve our ability to analyze the data, and present batch processing
results for 10 minutes of data with three different frequencies of available
USBL position fixes (gaps of 30 seconds, 1 minute, and 2 minutes). The results
suggest that the framework can be extended to real-time tracking using robust
windowed estimation.Comment: 9 pages, 9 figures, to be included in Robotics: Science and Systems
201
Information processing and signal integration in bacterial quorum sensing
Bacteria communicate using secreted chemical signaling molecules called
autoinducers in a process known as quorum sensing. The quorum-sensing network
of the marine bacterium {\it Vibrio harveyi} employs three autoinducers, each
known to encode distinct ecological information. Yet how cells integrate and
interpret the information contained within the three autoinducer signals
remains a mystery. Here, we develop a new framework for analyzing signal
integration based on Information Theory and use it to analyze quorum sensing in
{\it V. harveyi}. We quantify how much the cells can learn about individual
autoinducers and explain the experimentally observed input-output relation of
the {\it V. harveyi} quorum-sensing circuit. Our results suggest that the need
to limit interference between input signals places strong constraints on the
architecture of bacterial signal-integration networks, and that bacteria likely
have evolved active strategies for minimizing this interference. Here we
analyze two such strategies: manipulation of autoinducer production and
feedback on receptor number ratios.Comment: Supporting information is in appendi
Validating Predictions of Unobserved Quantities
The ultimate purpose of most computational models is to make predictions,
commonly in support of some decision-making process (e.g., for design or
operation of some system). The quantities that need to be predicted (the
quantities of interest or QoIs) are generally not experimentally observable
before the prediction, since otherwise no prediction would be needed. Assessing
the validity of such extrapolative predictions, which is critical to informed
decision-making, is challenging. In classical approaches to validation, model
outputs for observed quantities are compared to observations to determine if
they are consistent. By itself, this consistency only ensures that the model
can predict the observed quantities under the conditions of the observations.
This limitation dramatically reduces the utility of the validation effort for
decision making because it implies nothing about predictions of unobserved QoIs
or for scenarios outside of the range of observations. However, there is no
agreement in the scientific community today regarding best practices for
validation of extrapolative predictions made using computational models. The
purpose of this paper is to propose and explore a validation and predictive
assessment process that supports extrapolative predictions for models with
known sources of error. The process includes stochastic modeling, calibration,
validation, and predictive assessment phases where representations of known
sources of uncertainty and error are built, informed, and tested. The proposed
methodology is applied to an illustrative extrapolation problem involving a
misspecified nonlinear oscillator
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
- âŠ