33 research outputs found

    SILVR: Guided Diffusion for Molecule Generation

    Get PDF
    Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small molecule compounds, but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond X-Chem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.Comment: paper, 20 paper, 11 figure

    Estimating Equilibrium Expectations from Time-Correlated Simulation Data at Multiple Thermodynamic States

    Get PDF
    Computing the equilibrium properties of complex systems, such as free energy differences, is often hampered by rare events in the dynamics. Enhanced sampling methods may be used in order to speed up sampling by, for example, using high temperatures, as in parallel tempering, or simulating with a biasing potential such as in the case of umbrella sampling. The equilibrium properties of the thermodynamic state of interest (e.g., lowest temperature or unbiased potential) can be computed using reweighting estimators such as the weighted histogram analysis method or the multistate Bennett acceptance ratio (MBAR). weighted histogram analysis method and MBAR produce unbiased estimates, the simulation samples from the global equilibria at their respective thermodynamic states—a requirement that can be prohibitively expensive for some simulations such as a large parallel tempering ensemble of an explicitly solvated biomolecule. Here, we introduce the transition-based reweighting analysis method (TRAM)—a class of estimators that exploit ideas from Markov modeling and only require the simulation data to be in local equilibrium within subsets of the configuration space. We formulate the expanded TRAM (xTRAM) estimator that is shown to be asymptotically unbiased and a generalization of MBAR. Using four exemplary systems of varying complexity, we demonstrate the improved convergence (ranging from a twofold improvement to several orders of magnitude) of xTRAM in comparison to a direct counting estimator and MBAR, with respect to the invested simulation effort. Lastly, we introduce a random-swapping simulation protocol that can be used with xTRAM, gaining orders-of-magnitude advantages over simulation protocols that require the constraint of sampling from a global equilibrium

    Self-organized emergence of folded protein-like network structures from geometric constraints

    Get PDF
    The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the biochemical environment and the temporal sequence of distinct interactions yield a complex folding process that cannot yet be easily tracked for all proteins. To gain qualitative insights into the fundamental mechanisms behind the folding dynamics and generic features of the folded structure, we propose a simple model of structure formation that takes into account only fundamental geometric constraints and otherwise assumes randomly paired connections. We find that despite its simplicity, the model results in a network ensemble consistent with key overall features of the ensemble of Protein Residue Networks we obtained from more than 1000 biological protein geometries as available through the Protein Data Base. Specifically, the distribution of the number of interaction neighbors a unit (amino acid) has, the scaling of the structure's spatial extent with chain length, the eigenvalue spectrum and the scaling of the smallest relaxation time with chain length are all consistent between model and real proteins. These results indicate that geometric constraints alone may already account for a number of generic features of protein tertiary structures

    Thermodynamics of trajectories of the one-dimensional Ising model

    Full text link
    We present a numerical study of the dynamics of the one-dimensional Ising model by applying the large-deviation method to describe ensembles of dynamical trajectories. In this approach trajectories are classified according to a dynamical order parameter and the structure of ensembles of trajectories can be understood from the properties of large-deviation functions, which play the role of dynamical free-energies. We consider both Glauber and Kawasaki dynamics, and also the presence of a magnetic field. For Glauber dynamics in the absence of a field we confirm the analytic predictions of Jack and Sollich about the existence of critical dynamical, or space-time, phase transitions at critical values of the "counting" field ss. In the presence of a magnetic field the dynamical phase diagram also displays first order transition surfaces. We discuss how these non-equilibrium transitions in the 1dd Ising model relate to the equilibrium ones of the 2dd Ising model. For Kawasaki dynamics we find a much simple dynamical phase structure, with transitions reminiscent of those seen in kinetically constrained models.Comment: 23 pages, 10 figure

    Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states

    Get PDF
    We propose a discrete transition-based reweighting analysis method (dTRAM) for analyzing configuration-space-discretized simulation trajectories produced at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM provides maximum-likelihood estimates of stationary quantities (probabilities, free energies, expectation values) at any thermodynamic state. In contrast to the weighted histogram analysis method (WHAM), dTRAM does not require data to be sampled from global equilibrium, and can thus produce superior estimates for enhanced sampling data such as parallel/simulated tempering, replica exchange, umbrella sampling, or metadynamics. In addition, dTRAM provides optimal estimates of Markov state models (MSMs) from the discretized state-space trajectories at all thermodynamic states. Under suitable conditions, these MSMs can be used to calculate kinetic quantities (e.g. rates, timescales). In the limit of a single thermodynamic state, dTRAM estimates a maximum likelihood reversible MSM, while in the limit of uncorrelated sampling data, dTRAM is identical to WHAM. dTRAM is thus a generalization to both estimators
    corecore