33 research outputs found
SILVR: Guided Diffusion for Molecule Generation
Computationally generating novel synthetically accessible compounds with high
affinity and low toxicity is a great challenge in drug design. Machine-learning
models beyond conventional pharmacophoric methods have shown promise in
generating novel small molecule compounds, but require significant tuning for a
specific protein target. Here, we introduce a method called selective iterative
latent variable refinement (SILVR) for conditioning an existing diffusion-based
equivariant generative model without retraining. The model allows the
generation of new molecules that fit into a binding site of a protein based on
fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond
X-Chem that form part of the COVID Moonshot project as a reference dataset for
conditioning the molecule generation. The SILVR rate controls the extent of
conditioning and we show that moderate SILVR rates make it possible to generate
new molecules of similar shape to the original fragments, meaning that the new
molecules fit the binding site without knowledge of the protein. We can also
merge up to 3 fragments into a new molecule without affecting the quality of
molecules generated by the underlying generative model. Our method is
generalizable to any protein target with known fragments and any
diffusion-based model for molecule generation.Comment: paper, 20 paper, 11 figure
Estimating Equilibrium Expectations from Time-Correlated Simulation Data at Multiple Thermodynamic States
Computing the equilibrium properties of complex systems, such as free energy
differences, is often hampered by rare events in the dynamics. Enhanced
sampling methods may be used in order to speed up sampling by, for example,
using high temperatures, as in parallel tempering, or simulating with a
biasing potential such as in the case of umbrella sampling. The equilibrium
properties of the thermodynamic state of interest (e.g., lowest temperature or
unbiased potential) can be computed using reweighting estimators such as the
weighted histogram analysis method or the multistate Bennett acceptance ratio
(MBAR). weighted histogram analysis method and MBAR produce unbiased
estimates, the simulation samples from the global equilibria at their
respective thermodynamic states—a requirement that can be prohibitively
expensive for some simulations such as a large parallel tempering ensemble of
an explicitly solvated biomolecule. Here, we introduce the transition-based
reweighting analysis method (TRAM)—a class of estimators that exploit ideas
from Markov modeling and only require the simulation data to be in local
equilibrium within subsets of the configuration space. We formulate the
expanded TRAM (xTRAM) estimator that is shown to be asymptotically unbiased
and a generalization of MBAR. Using four exemplary systems of varying
complexity, we demonstrate the improved convergence (ranging from a twofold
improvement to several orders of magnitude) of xTRAM in comparison to a direct
counting estimator and MBAR, with respect to the invested simulation effort.
Lastly, we introduce a random-swapping simulation protocol that can be used
with xTRAM, gaining orders-of-magnitude advantages over simulation protocols
that require the constraint of sampling from a global equilibrium
Self-organized emergence of folded protein-like network structures from geometric constraints
The intricate three-dimensional geometries of protein tertiary structures
underlie protein function and emerge through a folding process from
one-dimensional chains of amino acids. The exact spatial sequence and
configuration of amino acids, the biochemical environment and the temporal
sequence of distinct interactions yield a complex folding process that cannot
yet be easily tracked for all proteins. To gain qualitative insights into the
fundamental mechanisms behind the folding dynamics and generic features of the
folded structure, we propose a simple model of structure formation that takes
into account only fundamental geometric constraints and otherwise assumes
randomly paired connections. We find that despite its simplicity, the model
results in a network ensemble consistent with key overall features of the
ensemble of Protein Residue Networks we obtained from more than 1000 biological
protein geometries as available through the Protein Data Base. Specifically,
the distribution of the number of interaction neighbors a unit (amino acid)
has, the scaling of the structure's spatial extent with chain length, the
eigenvalue spectrum and the scaling of the smallest relaxation time with chain
length are all consistent between model and real proteins. These results
indicate that geometric constraints alone may already account for a number of
generic features of protein tertiary structures
Thermodynamics of trajectories of the one-dimensional Ising model
We present a numerical study of the dynamics of the one-dimensional Ising
model by applying the large-deviation method to describe ensembles of dynamical
trajectories. In this approach trajectories are classified according to a
dynamical order parameter and the structure of ensembles of trajectories can be
understood from the properties of large-deviation functions, which play the
role of dynamical free-energies. We consider both Glauber and Kawasaki
dynamics, and also the presence of a magnetic field. For Glauber dynamics in
the absence of a field we confirm the analytic predictions of Jack and Sollich
about the existence of critical dynamical, or space-time, phase transitions at
critical values of the "counting" field . In the presence of a magnetic
field the dynamical phase diagram also displays first order transition
surfaces. We discuss how these non-equilibrium transitions in the 1 Ising
model relate to the equilibrium ones of the 2 Ising model. For Kawasaki
dynamics we find a much simple dynamical phase structure, with transitions
reminiscent of those seen in kinetically constrained models.Comment: 23 pages, 10 figure
Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states
We propose a discrete transition-based reweighting analysis method (dTRAM)
for analyzing configuration-space-discretized simulation trajectories produced
at different thermodynamic states (temperatures, Hamiltonians, etc.) dTRAM
provides maximum-likelihood estimates of stationary quantities (probabilities,
free energies, expectation values) at any thermodynamic state. In contrast to
the weighted histogram analysis method (WHAM), dTRAM does not require data to
be sampled from global equilibrium, and can thus produce superior estimates for
enhanced sampling data such as parallel/simulated tempering, replica exchange,
umbrella sampling, or metadynamics. In addition, dTRAM provides optimal
estimates of Markov state models (MSMs) from the discretized state-space
trajectories at all thermodynamic states. Under suitable conditions, these MSMs
can be used to calculate kinetic quantities (e.g. rates, timescales). In the
limit of a single thermodynamic state, dTRAM estimates a maximum likelihood
reversible MSM, while in the limit of uncorrelated sampling data, dTRAM is
identical to WHAM. dTRAM is thus a generalization to both estimators