Bayesian modelling of DNA secondary structure kinetics : revisiting path space approximations and posterior inference in exponentially large state spaces

Abstract

Nucleic acid strands, which react by forming and breaking Watson-Crick base pairs, can be designed to form complex nanoscale structures or devices. Controlling such systems requires accurate predictions of the reaction rate and folding pathways of the interacting strands. These kinetic properties can be modelled using continuous-time Markov chains (CTMCs), whose states and transitions correspond to secondary structures and elementary base pair changes, respectively. The transient dynamics of a CTMC are determined by a kinetic model, which assigns transition rates to pairs of states. The rate of a reaction can be estimated using its CTMC's mean first passage time (MFPT), which can be computed exactly by solving a linear system, or approximated via Monte Carlo simulation. However, both approaches may be computationally infeasible for rare event reactions in larger systems. This limitation can be addressed by constructing truncated CTMCs, which only include a small subset of states and transitions, selected either manually or through simulation. In recent work, posterior inference in an Arrhenius-type kinetic model was performed using a fixed set of manually truncated CTMCs and a small experimental dataset of DNA reaction rates. We extend this Bayesian approach, using a larger dataset of 1105 reactions, a new prior model that is directly motivated by the physical meaning of the parameters and is compatible with experimental measurements of elementary rates, and larger truncated state spaces, constructed stochastically using the recently introduced pathway elaboration method. Despite a significantly higher computational cost, we find that the larger state spaces do not necessarily lead to more accurate rate predictions than the small, manually designed state spaces. For posterior approximation, we apply the standard random walk Metropolis algorithm and the gradient-based no-u-turn sampler. Our posterior approximations, which are often multimodal, recover an expected correlation structure among the kinetic parameters. However, we also uncover severe numerical instability in the MPFT computations. Due to numerous design limitations in the legacy software, a significant refactoring effort was required to implement the above extensions, resulting also in improvements in performance and reproducibility.Science, Faculty ofComputer Science, Department ofGraduat

    Similar works