118 research outputs found

    Overcoming the timescale barrier in molecular dynamics: Transfer operators, variational principles and machine learning

    Get PDF
    One of the main challenges in molecular dynamics is overcoming the ‘timescale barrier’: in many realistic molecular systems, biologically important rare transitions occur on timescales that are not accessible to direct numerical simulation, even on the largest or specifically dedicated supercomputers. This article discusses how to circumvent the timescale barrier by a collection of transfer operator-based techniques that have emerged from dynamical systems theory, numerical mathematics and machine learning over the last two decades. We will focus on how transfer operators can be used to approximate the dynamical behaviour on long timescales, review the introduction of this approach into molecular dynamics, and outline the respective theory, as well as the algorithmic development, from the early numerics-based methods, via variational reformulations, to modern data-based techniques utilizing and improving concepts from machine learning. Furthermore, its relation to rare event simulation techniques will be explained, revealing a broad equivalence of variational principles for long-time quantities in molecular dynamics. The article will mainly take a mathematical perspective and will leave the application to real-world molecular systems to the more than 1000 research articles already written on this subject

    Applications of Molecular Dynamics simulations for biomolecular systems and improvements to density-based clustering in the analysis

    Get PDF
    Molecular Dynamics simulations provide a powerful tool to study biomolecular systems with atomistic detail. The key to better understand the function and behaviour of these molecules can often be found in their structural variability. Simulations can help to expose this information that is otherwise experimentally hard or impossible to attain. This work covers two application examples for which a sampling and a characterisation of the conformational ensemble could reveal the structural basis to answer a topical research question. For the fungal toxin phalloidin—a small bicyclic peptide—observed product ratios in different cyclisation reactions could be rationalised by assessing the conformational pre-organisation of precursor fragments. For the C-type lectin receptor langerin, conformational changes induced by different side-chain protonations could deliver an explanation of the pH-dependency in the protein’s calcium-binding. The investigations were accompanied by the continued development of a density-based clustering protocol into a respective software package, which is generally well applicable for the use case of extracting conformational states from Molecular Dynamics data

    Coarse-grained modeling for molecular discovery:Applications to cardiolipin-selectivity

    Get PDF
    The development of novel materials is pivotal for addressing global challenges such as achieving sustainability, technological progress, and advancements in medical technology. Traditionally, developing or designing new molecules was a resource-intensive endeavor, often reliant on serendipity. Given the vast space of chemically feasible drug-like molecules, estimated between 106 - 10100 compounds, traditional in vitro techniques fall short.Consequently, in silico tools such as virtual screening and molecular modeling have gained increasing recognition. However, the computational cost and the limited precision of the utilized molecular models still limit computational molecular design.This thesis aimed to enhance the molecular design process by integrating multiscale modeling and free energy calculations. Employing a coarse-grained model allowed us to efficiently traverse a significant portion of chemical space and reduce the sampling time required by molecular dynamics simulations. The physics-informed nature of the applied Martini force field and its level of retained structural detail make the model a suitable starting point for the focused learning of molecular properties.We applied our proposed approach to a cardiolipin bilayer, posing a relevant and challenging problem and facilitating reasonable comparison to experimental measurements.We identified promising molecules with defined properties within the resolution limit of a coarse-grained representation. Furthermore, we were able to bridge the gap from in silico predictions to in vitro and in vivo experiments, supporting the validity of the theoretical concept. The findings underscore the potential of multiscale modeling and free-energy calculations in enhancing molecular discovery and design and offer a promising direction for future research

    Accelerating and Privatizing Diffusion Models

    Get PDF
    Diffusion models (DMs) have emerged as a powerful class of generative models. DMs offer both state-of-the-art synthesis quality and sample diversity in combination with a robust and scalable learning objective. DMs rely on a diffusion process that gradually perturbs the data towards a normal distribution, while the neural network learns to denoise. Formally, the problem reduces to learning the score function, i.e., the gradient of the log-density of the perturbed data. The reverse of the diffusion process can be approximated by a differential equation, defined by the learned score function, and can therefore be used for generation when starting from random noise. In this thesis, we give a thorough and beginner-friendly introduction to DMs and discuss their history starting from early work on score-based generative models. Furthermore, we discuss connections to other statistical models and lay out applications of DMs, with a focus on image generative modeling. We then present CLD: a new DM based on critically-damped Langevin dynamics. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD-based DMs and introduce a fast solver for the reverse diffusion process which is inspired by methods from the statistical mechanics literature. The CLD framework provides new insights into DMs and generalizes many existing DMs which are based on overdamped Langevin dynamics. Next, we present GENIE, a novel higher-order numerical solver for DMs. Many existing higher-order solvers for DMs built on finite difference schemes which break down in the large step size limit as approximations become too crude. GENIE, on the other hand, learns neural network-based models for higher-order derivatives whose precision do not depend on the step size. The additional networks in GENIE are implemented as small output heads on top of the neural backbone of the original DM, keeping the computational overhead minimal. Unlike recent sampling distillation methods that fundamentally alter the generation process in DMs, GENIE still solves the true generative differential equation, and therefore naturally enables applications such as encoding and guided sampling. The fourth chapter presents differentially private diffusion models (DPDMs), DMs trained with strict differential privacy guarantees. While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained on sensitive data with differential privacy guarantees can sidestep this challenge, providing access to synthetic data instead. DPDMs enforce privacy by using differentially private stochastic gradient descent for training. We thoroughly study the design space of DPDMs and propose noise multiplicity, a simple yet powerful modification of the DM training objective tailored to the differential privacy setting. We motivate and show numerically why DMs are better suited for differentially private generative modeling than one-shot generators such as generative adversarial networks or normalizing flows. Finally, we propose to distill the knowledge of large pre-trained DMs into smaller student DMs. Large-scale DMs have achieved unprecedented results across several domains, however, they generally require a large amount of GPU memory and are slow at inference time, making it difficult to deploy them in real-time or on resource-limited devices. In particular, we propose an approximate score matching objective that regresses the student model towards predictions of the teacher DM rather than the clean data as is done in standard DM training. We show that student models outperform the larger teacher model for a variety of compute budgets. Additionally, the student models may also be deployed on GPUs with significantly less memory than was required for the original teacher model

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

    Coarse-grained modeling for molecular discovery:Applications to cardiolipin-selectivity

    Get PDF
    The development of novel materials is pivotal for addressing global challenges such as achieving sustainability, technological progress, and advancements in medical technology. Traditionally, developing or designing new molecules was a resource-intensive endeavor, often reliant on serendipity. Given the vast space of chemically feasible drug-like molecules, estimated between 106 - 10100 compounds, traditional in vitro techniques fall short.Consequently, in silico tools such as virtual screening and molecular modeling have gained increasing recognition. However, the computational cost and the limited precision of the utilized molecular models still limit computational molecular design.This thesis aimed to enhance the molecular design process by integrating multiscale modeling and free energy calculations. Employing a coarse-grained model allowed us to efficiently traverse a significant portion of chemical space and reduce the sampling time required by molecular dynamics simulations. The physics-informed nature of the applied Martini force field and its level of retained structural detail make the model a suitable starting point for the focused learning of molecular properties.We applied our proposed approach to a cardiolipin bilayer, posing a relevant and challenging problem and facilitating reasonable comparison to experimental measurements.We identified promising molecules with defined properties within the resolution limit of a coarse-grained representation. Furthermore, we were able to bridge the gap from in silico predictions to in vitro and in vivo experiments, supporting the validity of the theoretical concept. The findings underscore the potential of multiscale modeling and free-energy calculations in enhancing molecular discovery and design and offer a promising direction for future research

    In Silico Strategies for Prospective Drug Repositionings

    Get PDF
    The discovery of new drugs is one of pharmaceutical research's most exciting and challenging tasks. Unfortunately, the conventional drug discovery procedure is chronophagous and seldom successful; furthermore, new drugs are needed to address our clinical challenges (e.g., new antibiotics, new anticancer drugs, new antivirals).Within this framework, drug repositioning—finding new pharmacodynamic properties for already approved drugs—becomes a worthy drug discovery strategy.Recent drug discovery techniques combine traditional tools with in silico strategies to identify previously unaccounted properties for drugs already in use. Indeed, big data exploration techniques capitalize on the ever-growing knowledge of drugs' structural and physicochemical properties, drug–target and drug–drug interactions, advances in human biochemistry, and the latest molecular and cellular biology discoveries.Following this new and exciting trend, this book is a collection of papers introducing innovative computational methods to identify potential candidates for drug repositioning. Thus, the papers in the Special Issue In Silico Strategies for Prospective Drug Repositionings introduce a wide array of in silico strategies such as complex network analysis, big data, machine learning, molecular docking, molecular dynamics simulation, and QSAR; these strategies target diverse diseases and medical conditions: COVID-19 and post-COVID-19 pulmonary fibrosis, non-small lung cancer, multiple sclerosis, toxoplasmosis, psychiatric disorders, or skin conditions

    A review of commercialisation mechanisms for carbon dioxide removal

    Get PDF
    The deployment of carbon dioxide removal (CDR) needs to be scaled up to achieve net zero emission pledges. In this paper we survey the policy mechanisms currently in place globally to incentivise CDR, together with an estimate of what different mechanisms are paying per tonne of CDR, and how those costs are currently distributed. Incentive structures are grouped into three structures, market-based, public procurement, and fiscal mechanisms. We find the majority of mechanisms currently in operation are underresourced and pay too little to enable a portfolio of CDR that could support achievement of net zero. The majority of mechanisms are concentrated in market-based and fiscal structures, specifically carbon markets and subsidies. While not primarily motivated by CDR, mechanisms tend to support established afforestation and soil carbon sequestration methods. Mechanisms for geological CDR remain largely underdeveloped relative to the requirements of modelled net zero scenarios. Commercialisation pathways for CDR require suitable policies and markets throughout the projects development cycle. Discussion and investment in CDR has tended to focus on technology development. Our findings suggest that an equal or greater emphasis on policy innovation may be required if future requirements for CDR are to be met. This study can further support research and policy on the identification of incentive gaps and realistic potential for CDR globally

    A Probabilistic Treatment To Point Cloud Matching And Motion Estimation

    Get PDF
    Probabilistic and efficient motion estimation is paramount in many robotic applications, including state estimation and position tracking. Iterative closest point (ICP) is a popular algorithm that provides ego-motion estimates for mobile robots by matching point cloud pairs. Estimating motion efficiently using ICP is challenging due to the large size of point clouds. Further, sensor noise and environmental uncertainties result in uncertain motion and state estimates. Probabilistic inference is a principled approach to quantify uncertainty but is computationally expensive and thus challenging to use in complex real-time robotics tasks. In this thesis, we address these challenges by leveraging recent advances in optimization and probabilistic inference and present four core contributions. First is SGD-ICP, which employs stochastic gradient descent (SGD) to align two point clouds efficiently. The second is Bayesian-ICP, which combines SGD-ICP with stochastic gradient Langevin dynamics to obtain distributions over transformations efficiently. We also propose an adaptive motion model that employs Bayesian-ICP to produce environment-aware proposal distributions for state estimation. The third is Stein-ICP, a probabilistic ICP technique that exploits GPU parallelism for speed gains. Stein-ICP exploits the Stein variational gradient descent framework to provide non-parametric estimates of the transformation and can model complex multi-modal distributions. The fourth contribution is Stein particle filter, capable of filtering non-Gaussian, high-dimensional dynamical systems. This method can be seen as a deterministic flow of particles from an initial to the desired state. This transport of particles is embedded in a reproducing kernel Hilbert space where particles interact with each other through a repulsive force that brings diversity among the particles

    Integrated Chemical Processes in Liquid Multiphase Systems

    Get PDF
    The essential principles of green chemistry are the use of renewable raw materials, highly efficient catalysts and green solvents linked with energy efficiency and process optimization in real-time. Experts from different fields show, how to examine all levels from the molecular elementary steps up to the design and operation of an entire plant for developing novel and efficient production processes
    • …
    corecore