8,151 research outputs found
ABC random forests for Bayesian parameter inference
This preprint has been reviewed and recommended by Peer Community In
Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036).
Approximate Bayesian computation (ABC) has grown into a standard methodology
that manages Bayesian inference for models associated with intractable
likelihood functions. Most ABC implementations require the preliminary
selection of a vector of informative statistics summarizing raw data.
Furthermore, in almost all existing implementations, the tolerance level that
separates acceptance from rejection of simulated parameter values needs to be
calibrated. We propose to conduct likelihood-free Bayesian inferences about
parameters with no prior selection of the relevant components of the summary
statistics and bypassing the derivation of the associated tolerance level. The
approach relies on the random forest methodology of Breiman (2001) applied in a
(non parametric) regression setting. We advocate the derivation of a new random
forest for each component of the parameter vector of interest. When compared
with earlier ABC solutions, this method offers significant gains in terms of
robustness to the choice of the summary statistics, does not depend on any type
of tolerance level, and is a good trade-off in term of quality of point
estimator precision and credible interval estimations for a given computing
time. We illustrate the performance of our methodological proposal and compare
it with earlier ABC methods on a Normal toy example and a population genetics
example dealing with human population evolution. All methods designed here have
been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5
figure
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks
An explosion of high-throughput DNA sequencing in the past decade has led to
a surge of interest in population-scale inference with whole-genome data.
Recent work in population genetics has centered on designing inference methods
for relatively simple model classes, and few scalable general-purpose inference
techniques exist for more realistic, complex models. To achieve this, two
inferential challenges need to be addressed: (1) population data are
exchangeable, calling for methods that efficiently exploit the symmetries of
the data, and (2) computing likelihoods is intractable as it requires
integrating over a set of correlated, extremely high-dimensional latent
variables. These challenges are traditionally tackled by likelihood-free
methods that use scientific simulators to generate datasets and reduce them to
hand-designed, permutation-invariant summary statistics, often leading to
inaccurate inference. In this work, we develop an exchangeable neural network
that performs summary statistic-free, likelihood-free inference. Our framework
can be applied in a black-box fashion across a variety of simulation-based
tasks, both within and outside biology. We demonstrate the power of our
approach on the recombination hotspot testing problem, outperforming the
state-of-the-art.Comment: 9 pages, 8 figure
Inferring hidden states in Langevin dynamics on large networks: Average case performance
We present average performance results for dynamical inference problems in
large networks, where a set of nodes is hidden while the time trajectories of
the others are observed. Examples of this scenario can occur in signal
transduction and gene regulation networks. We focus on the linear stochastic
dynamics of continuous variables interacting via random Gaussian couplings of
generic symmetry. We analyze the inference error, given by the variance of the
posterior distribution over hidden paths, in the thermodynamic limit and as a
function of the system parameters and the ratio {\alpha} between the number of
hidden and observed nodes. By applying Kalman filter recursions we find that
the posterior dynamics is governed by an "effective" drift that incorporates
the effect of the observations. We present two approaches for characterizing
the posterior variance that allow us to tackle, respectively, equilibrium and
nonequilibrium dynamics. The first appeals to Random Matrix Theory and reveals
average spectral properties of the inference error and typical posterior
relaxation times, the second is based on dynamical functionals and yields the
inference error as the solution of an algebraic equation.Comment: 20 pages, 5 figure
Training deep neural density estimators to identify mechanistic models of neural dynamics
Mechanistic modeling in neuroscience aims to explain observed phenomena in terms of underlying causes. However, determining which model parameters agree with complex and stochastic neural data presents a significant challenge. We address this challenge with a machine learning tool which uses deep neural density estimators-- trained using model simulations-- to carry out Bayesian inference and retrieve the full space of parameters compatible with raw data or selected data features. Our method is scalable in parameters and data features, and can rapidly analyze new data after initial training. We demonstrate the power and flexibility of our approach on receptive fields, ion channels, and Hodgkin-Huxley models. We also characterize the space of circuit configurations giving rise to rhythmic activity in the crustacean stomatogastric ganglion, and use these results to derive hypotheses for underlying compensation mechanisms. Our approach will help close the gap between data-driven and theory-driven models of neural dynamics
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
- …