5 research outputs found
Efficient Sampling in Stochastic Biological Models
Even when the underlying dynamics are known, studying the emergent behavior of stochastic biological systems in silico can be computationally intractable, due to the difficulty of comprehensively sampling these models.
This thesis presents the study of two techniques for efficiently sampling models of complex biological systems.
First, the weighted ensemble enhanced sampling technique is adapted for use in sampling chemical kinetics simulations, as well as spatially resolved stochastic reaction-diffusion kinetics.
The technique is shown to scale to large, cell-scale simulations, and to accelerate the sampling of observables by orders of magnitude in some cases.
Second, I study the free energy estimates of peptides and proteins using Markov random fields.
These graphical models are constructed from physics-based forcefields, uniformly sampled at different densities in dihedral angle space, and free energy estimates are computed using loopy belief propagation.
The effect of sample density on the free energy estimates provided by loopy belief propagation is assessed, and it is found that in most cases a modest increase in sample density leads to significant improvement in convergence.
Additionally, the approximate free energies from loopy belief propagation are compared to statistically exact computations and are confirmed to be both accurate and orders of magnitude faster than traditional methods in the models assessed
Atlas of Transcription Factor Binding Sites from ENCODE DNase Hypersensitivity Data across 27 Tissue Types.
Characterizing the tissue-specific binding sites of transcription factors (TFs) is essential to reconstruct gene regulatory networks and predict functions for non-coding genetic variation. DNase-seq footprinting enables the prediction of genome-wide binding sites for hundreds of TFs simultaneously. Despite the public availability of high-quality DNase-seq data from hundreds of samples, a comprehensive, up-to-date resource for the locations of genomic footprints is lacking. Here, we develop a scalable footprinting workflow using two state-of-the-art algorithms: Wellington and HINT. We apply our workflow to detect footprints in 192 ENCODE DNase-seq experiments and predict the genomic occupancy of 1,515 human TFs in 27 human tissues. We validate that these footprints overlap true-positive TF binding sites from ChIP-seq. We demonstrate that the locations, depth, and tissue specificity of footprints predict effects of genetic variants on gene expression and capture a substantial proportion of genetic risk for complex traits
Systematic Testing of Belief-Propagation Estimates for Absolute Free Energies in Atomistic Peptides and Proteins
Motivated by the extremely high computing
costs associated with
estimates of free energies for biological systems using molecular
simulations, we further the exploration of existing âbelief
propagationâ (BP) algorithms for fixed-backbone peptide and
protein systems. The precalculation of pairwise interactions among
discretized libraries of side-chain conformations, along with representation
of protein side chains as nodes in a graphical model, enables direct
application of the BP approach, which requires only âź1 s of
single-processor run time after the precalculation stage. We use a
âloopy BPâ algorithm, which can be seen as an approximate
generalization of the transfer-matrix approach to highly connected
(i.e., loopy) graphs, and it has previously been applied to protein
calculations. We examine the application of loopy BP to several peptides
as well as the binding site of the T4 lysozyme L99A mutant. The present
study reports on (i) the comparison of the approximate BP results
with estimates from unbiased estimators based on the Amber99SB force
field; (ii) investigation of the effects of varying library size on
BP predictions; and (iii) a theoretical discussion of the discretization
effects that can arise in BP calculations. The data suggest that,
despite their approximate nature, BP free-energy estimates are highly
accurateî¸indeed, they never fall outside confidence intervals
from unbiased estimators for the systems where independent results
could be obtained. Furthermore, we find that libraries of sufficiently
fine discretization (which diminish library-size sensitivity) can
be obtained with standard computing resources in most cases. Altogether,
the extremely low computing times and accurate results suggest the
BP approach warrants further study