560 research outputs found

    Coupling sample paths to the thermodynamic limit in Monte Carlo estimators with applications to gene expression

    Get PDF
    Many biochemical systems appearing in applications have a multiscale structure so that they converge to piecewise deterministic Markov processes in a thermodynamic limit. The statistics of the piecewise deterministic process can be obtained much more efficiently than those of the exact process. We explore the possibility of coupling sample paths of the exact model to the piecewise deterministic process in order to reduce the variance of their difference. We then apply this coupling to reduce the computational complexity of a Monte Carlo estimator. Motivated by the rigorous results in [1], we show how this method can be applied to realistic biological models with nontrivial scalings

    Efficient Finite-difference Methods for Sensitivity Analysis of Stiff Stochastic Discrete Models of Biochemical Systems

    Get PDF
    In the study of Systems Biology it is necessary to simulate cellular processes and chemical reactions that comprise biochemical systems. This is achieved through a range of mathematical modeling approaches. Standard methods use deterministic differential equations, but because many biological processes are inherently probabilistic, stochastic models must be used to capture the random fluctuations observed in these systems. The presence of noise in a system can be a significant factor in determining its behavior. The Chemical Master Equation is a valuable stochastic model of biochemical kinetics. Models based on this formalism rely on physically motivated parameters, but often these parameters are not well constrained by experiments. One important tool in the study of biochemical systems is sensitivity analysis, which aims to quantify the dependence of a system's dynamics on model parameters. Several approaches to sensitivity analysis of these models have been developed. We proposed novel methods for estimating sensitivities of discrete stochastic models of biochemical reaction systems. We used finite-difference approximations and adaptive tau-leaping strategies to estimate the sensitivities for stiff stochastic biochemical kinetics models, resulting in significant speed-up in comparison with previously published approaches for a similar accuracy. We also developed an approach for estimating sensitivity coefficients involving adaptive implicit tau-leaping strategies. We provide a comparison of these methodologies in order to identify which approach is most efficient depending of the features of the model. These results can facilitate efficient sensitivity analysis, which can serve as a foundation for the formulation, characterization, verification and reduction of models as well as further applications to identifiability analysis

    Konfigurations-Pfadintegral-Monte-Carlo:Ab-initio-Simulationen von Fermionen im Bereich warmer, dichter Materie

    Get PDF
    Recent advances in warm dense matter physics, e.g. laser compressed matter, lead to an increasing interest in the description of correlated, degenerate electrons at finite temperatures. Path integral Monte Carlo (PIMC) methods cannot correctly describe weakly to moderately coupled and strongly degenerate Fermi systems due to the so-called fermion sign problem. The Configuration Path Integral Monte Carlo (CPIMC) approach greatly reduces the sign problem and allows for the exact computation of thermodynamic properties in this regime. In addition, the first successful implementation of the diagrammatic worm algorithm for a general Hamiltonian in Fock space with arbitrary pair interactions gives direct access to the Matsubara Green function. This thesis demonstrates the capabilities of the CPIMC approach for a model system of Coulomb interacting fermions in a two-dimensional harmonic trap. The correctness of the CPIMC implementation is verified by rigorous comparisons with an exact diagonalization method. Benchmark results are presented, which reveal large errors of the Hartree-Fock approximation in open shell configurations even for weak coupling and a significant deviation of multi-level blocking PIMC data in the complete basis set limit. The application of the CPIMC method to the warm dense homogeneous electron gas (HEG) quantifies the accuracy of recently published restricted PIMC (RPIMC) results, which have been the basis for the construction of exchange-correlation free energy functionals to be used in finite-temperature density functional theory calculations of warm dense matter. It is shown that the errors of the RPIMC data exceed 10 % at intermediate densities. Additionally, highly accurate data for the exchange-correlation energy at high densities, which are inaccessible by the RPIMC method, are provided. These results are needed to significantly increase the quality of future exchange-correlation functionals to be used in finite-temperature applications.Auf Grund großer Fortschritte im Bereich warmer, dichter Materie, wie z.B. Laser-Kompression, ist die genaue Beschreibung korrelierter, entarteter Elektronen bei endlichen Temperaturen von wachsender Bedeutung. Pfadintegral-Monte-Carlo (PIMC) Methoden können schwach bis moderat gekoppelte und stark entartete Fermionen auf Grund des sogenannten fermionischen Vorzeichenproblems nicht korrekt beschreiben. Der Konfigurations-Pfadintegral-Monte-Carlo (CPIMC) Ansatz reduziert das Vorzeichenproblem und erlaubt die exakte Berechnung thermodynamischer Eigenschaften in diesem Bereich. Die erste erfolgreiche Implementierung des diagrammatischen Wurmalgorithmus für einen allgemeinen Hamiltonoperator im Fockraum mit beliebiger Paarwechselwirkung ermöglicht den direkten Zugriff auf die Matsubara-Green-Funktion. Diese Arbeit demonstriert die Fähigkeiten des CPIMC Ansatzes für ein Modellsystem aus Coulomb-wechselwirkenden Fermionen in einer 2D harmonischen Falle. Die Korrektheit der CPIMC-Implementierung wird durch Vergleiche mit einer exakten Diagonalisierungsmethode verifiziert. Die präsentierten Benchmark-Ergebnisse belegen große Fehler der Hartree-Fock-Näherung für Systeme mit offenen Schalen schon bei geringen Wechselwirkungsstärken und eine signifikante Abweichung von Multi-Level-Blocking-PIMC-Daten im Limes einer vollständigen Basis. Die Anwendung der CPIMC-Methode auf das warme, dichte homogene Elektronengas (HEG) erbringt den Nachweis von systematischen Ungenauigkeiten von vor Kurzem veröffentlichen Restricted-PIMC (RPIMC) Ergebnissen. Die relativen Abweichungen der RPIMC-Daten zu den exakten CPIMC-Resultaten bei mittleren Dichten übersteigen 10%. Die in dieser Arbeit erzeugten, äußerst genauen Ergebnisse für die Austauschkorrelationsenergie bei hohen Dichten, welche unzugänglich für die RPIMC-Methode sind, können helfen, die Genauigkeit von zukünftigen Austauschkorrelationsfunktionalen für die Dichtefunktionaltheorie bei endlichen Temperaturen signifikant zu erhöhen

    Molecular Dynamics Simulation

    Get PDF
    Condensed matter systems, ranging from simple fluids and solids to complex multicomponent materials and even biological matter, are governed by well understood laws of physics, within the formal theoretical framework of quantum theory and statistical mechanics. On the relevant scales of length and time, the appropriate ‘first-principles’ description needs only the Schroedinger equation together with Gibbs averaging over the relevant statistical ensemble. However, this program cannot be carried out straightforwardly—dealing with electron correlations is still a challenge for the methods of quantum chemistry. Similarly, standard statistical mechanics makes precise explicit statements only on the properties of systems for which the many-body problem can be effectively reduced to one of independent particles or quasi-particles. [...

    심층학습을 이용한 액체계의 성질 예측

    Get PDF
    학위논문(박사)--서울대학교 대학원 :자연과학대학 화학부,2020. 2. 정연준.최근 기계학습 기술의 급격한 발전과 이의 화학 분야에 대한 적용은 다양한 화학적 성질에 대한 구조-성질 정량 관계를 기반으로 한 예측 모형의 개발을 가속하고 있다. 용매화 자유 에너지는 그러한 기계학습의 적용 예중 하나이며 다양한 용매 내의 화학반응에서 중요한 역할을 하는 근본적 성질 중 하나이다. 본 연구에서 우리는 목표로 하는 용매화 자유 에너지를 원자간의 상호작용으로부터 구할 수 있는 새로운 심층학습 기반 용매화 모형을 소개한다. 제안된 심층학습 모형의 계산 과정은 용매와 용질 분자에 대한 부호화 함수가 각 원자와 분자들의 구조적 성질에 대한 벡터 표현을 추출하며, 이를 토대로 원자간 상호작용을 복잡한 퍼셉트론 신경망 대신 벡터간의 간단한 내적으로 구할 수 있다. 952가지의 유기용질과 147가지의 유기용매를 포함하는 6,493가지의 실험치를 토대로 기계학습 모형의 교차 검증 시험을 실시한 결과, 평균 절대 오차 기준 0.2 kcal/mol 수준으로 매우 높은 정확도를 가진다. 스캐폴드-기반 교차 검증의 결과 역시 0.6 kcal/mol 수준으로, 외삽으로 분류할 수 있는 비교적 새로운 분자 구조에 대한 예측에 대해서도 우수한 정확도를 보인다. 또한, 제안된 특정 기계학습 모형은 그 구조 상 특정 용매에 특화되지 않았기 때문에 높은 양도성을 가지며 학습에 이용할 데이터의 수를 늘이는 데 용이하다. 원자간 상호작용에 대한 분석을 통해 제안된 심층학습 모형 용매화 자유 에너지에 대한 그룹-기여도를 잘 재현할 수 있음을 알 수 있으며, 기계학습을 통해 단순히 목표로 하는 성질만을 예측하는 것을 넘어 더욱 상세한 물리화학적 이해를 하는 것이 가능할 것이라 기대할 수 있다.Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties; the free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation result on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieves an outstanding performance, which is 0.2 kcal/mol in MUE. The scaffold-based split method exhibits 0.6 kcal/mol, which shows that the proposed model guarantees reasonable accuracy even for extrapolated cases. Moreover, the proposed model shows an excellent transferability for enlarging training data due to its solvent-non-specific nature. Analysis of the atomistic interaction map shows there is a great potential that our proposed model reproduces group contributions on the solvation energy, which makes us believe that the proposed model not only provides the predicted target property, but also gives us more detailed physicochemical insights.1. Introduction 1 2. Delfos: Deep Learning Model for Prediction of Solvation Free Energies in Generic Organic Solvents 7 2.1. Methods 7 2.1.1. Embedding of Chemical Contexts 7 2.1.2. Encoder-Predictor Network 9 2.2. Results and Discussions 13 2.2.1. Computational Setup and Results 13 2.2.2. Transferability of the Model for New Compounds 17 2.2.3. Visualization of Attention Mechanism 26 3. Group Contribution Method for the Solvation Energy Estimation with Vector Representations of Atom 29 3.1. Model Description 29 3.1.1. Word Embedding 29 3.1.2. Network Architecture 33 3.2. Results and Discussions 39 3.2.1. Computational Details 39 3.2.2. Prediction Accuracy 42 3.2.3. Model Transferability 44 3.2.4. Group Contributions of Solvation Energy 49 4. Empirical Structure-Property Relationship Model for Liquid Transport Properties 55 5. Concluding Remarks 61 A. Analyzing Kinetic Trapping as a First-Order Dynamical Phase Transition in the Ensemble of Stochastic Trajectories 65 A1. Introduction 65 A2. Theory 68 A3. Lattice Gas Model 70 A4. Mathematical Model 73 A5. Dynamical Phase Transitions 75 A6. Conclusion 82 B. Reaction-Path Thermodynamics of the Michaelis-Menten Kinetics 85 B1. Introduction 85 B2. Reaction Path Thermodynamics 88 B3. Fixed Observation Time 94 B4. Conclusions 101Docto

    Methods for Reconstructing Networks with Incomplete Information.

    Full text link
    Network representations of complex systems are widespread and reconstructing unknown networks from data has been intensively researched in statistical and scientific communities more broadly. Two challenges in network reconstruction problems include having insufficient data to illuminate the full structure of the network and needing to combine information from different data sources. Addressing these challenges, this thesis contributes methodology for network reconstruction in three respects. First, we consider sequentially choosing interventions to discover structure in directed networks focusing on learning a partial order over the nodes. This focus leads to a new model for intervention data under which nodal variables depend on the lengths of paths separating them from intervention targets rather than on parent sets. Taking a Bayesian approach, we present partial-order based priors and develop a novel Markov-Chain Monte Carlo (MCMC) method for computing posterior expectations over directed acyclic graphs. The utility of the MCMC approach comes from designing new proposals for the Metropolis algorithm that move locally among partial orders while independently sampling graphs from each partial order. The resulting Markov Chains mix rapidly and are ergodic. We also adapt an existing strategy for active structure learning, develop an efficient Monte Carlo procedure for estimating the resulting decision function, and evaluate the proposed methods numerically using simulations and benchmark datasets. We next study penalized likelihood methods using incomplete order information as arising from intervention data. To make the notion of incomplete information precise, we introduce and formally define incomplete partial orders which subsumes the important special case of a known total ordering of the nodes. This special case lies along an information lattice and we study the reconstruction performance of penalized likelihood methods at different points along this lattice. Finally, we present a method for ranking a network's potential edges using time-course data. The novelty is our development of a nonparametric gradient-matching procedure and a related summary statistic for measuring the strength of relationships among components in dynamic systems. Simulation studies demonstrate that given sufficient signal moving using this procedure to move from linear to additive approximations leads to improved rankings of potential edges.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113316/1/jbhender_1.pd

    Bayesian inference for protein signalling networks

    Get PDF
    Cellular response to a changing chemical environment is mediated by a complex system of interactions involving molecules such as genes, proteins and metabolites. In particular, genetic and epigenetic variation ensure that cellular response is often highly specific to individual cell types, or to different patients in the clinical setting. Conceptually, cellular systems may be characterised as networks of interacting components together with biochemical parameters specifying rates of reaction. Taken together, the network and parameters form a predictive model of cellular dynamics which may be used to simulate the effect of hypothetical drug regimens. In practice, however, both network topology and reaction rates remain partially or entirely unknown, depending on individual genetic variation and environmental conditions. Prediction under parameter uncertainty is a classical statistical problem. Yet, doubly uncertain prediction, where both parameters and the underlying network topology are unknown, leads to highly non-trivial probability distributions which currently require gross simplifying assumptions to analyse. Recent advances in molecular assay technology now permit high-throughput data-driven studies of cellular dynamics. This thesis sought to develop novel statistical methods in this context, focussing primarily on the problems of (i) elucidating biochemical network topology from assay data and (ii) prediction of dynamical response to therapy when both network and parameters are uncertain

    Physics of epigenetic landscapes and statistical inference by cells

    Full text link
    Biology is currently in the midst of a revolution. Great technological advances have led to unprecedented quantitative data at the whole genome level. However, new techniques are needed to deal with this deluge of high-dimensional data. Therefore, statistical physics has the potential to help develop systems biology level models that can incorporate complex data. Additionally, physicists have made great strides in understanding non-equilibrium thermodynamics. However, the consequences of these advances have yet to be fully incorporated into biology. There are three specific problems that I address in my dissertation. First, a common metaphor for describing development is a rugged "epigenetic landscape" where cell fates are represented as attracting valleys resulting from a complex regulatory network. I introduce a framework for explicitly constructing epigenetic landscapes that combines genomic data with techniques from spin-glass physics. The model reproduces known reprogramming protocols and identifies candidate transcription factors for reprogramming to novel cell fates, suggesting epigenetic landscapes are a powerful paradigm for understanding cellular identity. Second, I examine the dynamics of cellular reprogramming. By reanalyzing all available time-series data, I show that gene expression dynamics during reprogramming follow a simple one-dimensional reaction coordinate that is independent of both the time and details of experimental protocol used. I show that such a reaction coordinate emerges naturally from epigenetic landscape models of cell identity where cellular reprogramming is viewed as a "barrier-crossing" between the starting and ending cell fates. Overall, the analysis and model suggest that gene expression dynamics during reprogramming follow a canonical trajectory consistent with the idea of an "optimal path"' in gene expression space for reprogramming. Third, an important task of cells is to perform complex computations in response to external signals. Intricate networks are required to sense and process signals, and since cells are inherently non-equilibrium systems, these networks naturally consume energy. Since there is a deep connection between thermodynamics, computation, and information, a natural question is what constraints does thermodynamics place on statistical estimation and learning. I modeled a single chemical receptor and established the first fundamental relationship between the energy consumption and statistical accuracy of a receptor in a cell

    Statistical physics methods in computational biology

    Get PDF
    The interest of statistical physics for combinatorial optimization is not new, it suffices to think of a famous tool as simulated annealing. Recently, it has also resorted to statistical inference to address some "hard" optimization problems, developing a new class of message passing algorithms. Three applications to computational biology are presented in this thesis, namely: 1) Boolean networks, a model for gene regulatory networks; 2) haplotype inference, to study the genetic information present in a population; 3) clustering, a general machine learning tool
    corecore