3,284 research outputs found
From HIV protein sequences to viral fitness landscapes: a new paradigm for in silico vaccine design
Background: An inexpensive prophylactic vaccine offers the best hope to curb the HIV/AIDS epidemic gripping sub-Saharan Africa. Systematic means to guide the design of an effective immunogen for this, and other, infectious diseases are not available. What is required is a method to chart the peaks and valleys of viral fitness as a function of amino acid sequence. An efficacious vaccine would eject the virus from the high fitness peaks, and drive it into the valleys where its compromised fitness impairs its ability to replicate and inflict damage to the host.
Methods: Appealing to spin glass models in statistical physics, we present a novel approach to translate viral sequence databases into landscapes of viral fitness. These inferred models furnish a quantitative description of viral replicative capacity as a function of amino acid sequence. We illustrate this approach in the development of landscapes for the proteins of HIV-1 clade B Gag.
Results: In comparisons to experimental and clinical data, our inferred landscapes demonstrate excellent agreement with: 1) in vitro replicative fitness measurements, 2) clinically observed high-fitness circulating viral strains, 3) documented HLA associated CTL escape mutations, and 4) intra-host temporal adaptation pathways revealed by deep sequencing. These favorable comparisons support our landscapes as reflections of intrinsic viral fitness. We illustrate the value of such descriptions in the computational design of a CTL Gag immunogen.
Conclusion: We present a novel methodology to translate viral sequence data into quantitative landscapes of viral fitness. In an application to HIV-1 Gag, we illustrate excellent agreement of our model predictions with experimental and clinical data, and demonstrate a powerful new approach for HIV immunogen design. We anticipate that this approach may represent a heretofore unprecedented means to synthesize fitness landscapes for diverse pathogens, and may provide the basis for the design of improved prophylactic and therapeutic strategies
Statistically optimal continuous free energy surfaces from biased simulations and multistate reweighting
Free energies as a function of a selected set of collective variables are
commonly computed in molecular simulation and of significant value in
understanding and engineering molecular behavior. These free energy surfaces
are most commonly estimated using variants of histogramming techniques, but
such approaches obscure two important facets of these functions. First, the
empirical observations along the collective variable are defined by an ensemble
of discrete observations and the coarsening of these observations into a
histogram bins incurs unnecessary loss of information. Second, the free energy
surface is itself almost always a continuous function, and its representation
by a histogram introduces inherent approximations due to the discretization. In
this study, we relate the observed discrete observations from biased
simulations to the inferred underlying continuous probability distribution over
the collective variables and derive histogram-free techniques for estimating
this free energy surface. We reformulate free energy surface estimation as
minimization of a Kullback-Leibler divergence between a continuous trial
function and the discrete empirical distribution and show that this is
equivalent to likelihood maximization of a trial function given a set of
sampled data. We then present a fully Bayesian treatment of this formalism,
which enables the incorporation of powerful Bayesian tools such as the
inclusion of regularizing priors, uncertainty quantification, and model
selection techniques. We demonstrate this new formalism in the analysis of
umbrella sampling simulations for the torsion of a valine sidechain in
the L99A mutant of T4 lysozyme with benzene bound in the cavity.Comment: 24 pages, 5 figure
Nonlinear Machine Learning and Design of Reconfigurable Digital Colloids
Digital colloids, a cluster of freely rotating “halo particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N=4) and 30-state octahedral (N=6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility
Machine learning assembly landscapes from particle tracking data
Bottom-up self-assembly offers a powerful route for the fabrication of novel structural and functional materials. Rational engineering of self-assembling systems requires understanding of the accessible aggregation states and the structural assembly pathways. In this work, we apply nonlinear machine learning to experimental particle tracking data to infer low-dimensional assembly landscapes mapping the morphology, stability, and assembly pathways of accessible aggregates as a function of experimental conditions. To the best of our knowledge, this represents the first time that collective order parameters and assembly landscapes have been inferred directly from experimental data. We apply this technique to the nonequilibrium self-assembly of metallodielectric Janus colloids in an oscillating electric field, and quantify the impact of field strength, oscillation frequency, and salt concentration on the dominant assembly pathways and terminal aggregates. This combined computational and experimental framework furnishes new understanding of self-assembling systems, and quantitatively informs rational engineering of experimental conditions to drive assembly along desired aggregation pathways. © 2015 The Royal Society of Chemistryope
DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces
Coarse-grained molecular models of proteins permit access to length and time
scales unattainable by all-atom models and the simulation of processes that
occur on long-time scales such as aggregation and folding. The reduced
resolution realizes computational accelerations but an atomistic representation
can be vital for a complete understanding of mechanistic details. Backmapping
is the process of restoring all-atom resolution to coarse-grained molecular
models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive
Model for Non-Deterministic Backmapping) as an autoregressive denoising
diffusion probability model to restore all-atom details to coarse-grained
protein representations retaining only C{\alpha} coordinates. The
autoregressive generation process proceeds from the protein N-terminus to
C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace
and previously backmapped backbone and side chain atoms within the local
neighborhood. The local and autoregressive nature of our model makes it
transferable between proteins. The stochastic nature of the denoising diffusion
process means that the model generates a realistic ensemble of backbone and
side chain all-atom configurations consistent with the coarse-grained C{\alpha}
trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB)
and validate it in applications to a hold-out PDB test set,
intrinsically-disordered protein structures from the Protein Ensemble Database
(PED), molecular dynamics simulations of fast-folding mini-proteins from DE
Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art
reconstruction performance in terms of correct bond formation, avoidance of
side chain clashes, and diversity of the generated side chain configurational
states. We make DiAMoNDBack model publicly available as a free and open source
Python package
- …