13 research outputs found

    DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces

    Full text link
    Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package

    Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution

    Get PDF
    Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data

    Orbital Mixer: Using Atomic Orbital Features for Basis Dependent Prediction of Molecular Wavefunctions

    Full text link
    Leveraging ab initio data at scale has enabled the development of machine learning models capable of extremely accurate and fast molecular property prediction. A central paradigm of many previous works focuses on generating predictions for only a fixed set of properties. Recent lines of research instead aim to explicitly learn the electronic structure via molecular wavefunctions from which other quantum chemical properties can directly be derived. While previous methods generate predictions as a function of only the atomic configuration, in this work we present an alternate approach that directly purposes basis dependent information to predict molecular electronic structure. The backbone of our model, Orbital Mixer, uses MLP Mixer layers within a simple, intuitive, and scalable architecture and achieves competitive Hamiltonian and molecular orbital energy and coefficient prediction accuracies compared to the state-of-the-art

    Discovery of Self-Assembling π\pi-Conjugated Peptides by Active Learning-Directed Coarse-Grained Molecular Simulation

    Full text link
    Electronically-active organic molecules have demonstrated great promise as novel soft materials for energy harvesting and transport. Self-assembled nanoaggregates formed from π\pi-conjugated oligopeptides composed of an aromatic core flanked by oligopeptide wings offer emergent optoelectronic properties within a water soluble and biocompatible substrate. Nanoaggregate properties can be controlled by tuning core chemistry and peptide composition, but the sequence-structure-function relations remain poorly characterized. In this work, we employ coarse-grained molecular dynamics simulations within an active learning protocol employing deep representational learning and Bayesian optimization to efficiently identify molecules capable of assembling pseudo-1D nanoaggregates with good stacking of the electronically-active π\pi-cores. We consider the DXXX-OPV3-XXXD oligopeptide family, where D is an Asp residue and OPV3 is an oligophenylene vinylene oligomer (1,4-distyrylbenzene), to identify the top performing XXX tripeptides within all 203^3 = 8,000 possible sequences. By direct simulation of only 2.3% of this space, we identify molecules predicted to exhibit superior assembly relative to those reported in prior work. Spectral clustering of the top candidates reveals new design rules governing assembly. This work establishes new understanding of DXXX-OPV3-XXXD assembly, identifies promising new candidates for experimental testing, and presents a computational design platform that can be generically extended to other peptide-based and peptide-like systems

    Force Clamp Measurements and Dynamic Modeling of Protein Hydrogels

    No full text
    Protein hydrogels show great promise in their applications to developing smart biomaterials and drug delivery systems. A protein hydrogel is defined to be a highly cross-linked network of individual multi-domain proteins. Variable protein constructs and concentrations allow for protein hydrogels to exhibit highly malleable mechanical properties. Here we examine the force specific response of protein hydrogels made from different protein concentrations, and provide a dynamic mathematical model for hydrogel extension. By mathematically modeling individual protein\u27s orientation and force-dependent domain unfolding within the hydrogel we may explain the macroscopically observed elastic and mechanical properties, such as their hysteresis and stress-relaxation response. Measurements done using a novel force-clamp instrument allow for the verification of this mathematical model. The development of this mathematical model, in association with force-clamp measurements, serve as stepping stone for formulating the future of protein-based smart materials, such as those used in artificial skin and 3-D organ printing

    A Simulation of Protein Hydrogel Mechanics

    No full text
    Protein hydrogels serve as a platform for artificial tissue culture and other smart biomaterials. A protein hydrogel is a network of cross-linked protein chains that shows a unique response due to the force-induced (un)folding of its constituent domains. Here, we investigate the effect of force on the mechano-chemistry of hydrogels composed of tandem modular proteins. We report a mathematical model that takes into account the folding and unfolding of protein domains as a function of force and predicts the change in length of protein hydrogels due to a change in force. Our results reproduce the experimentally measured hysteresis and stress-relaxation behavior, and explain how protein orientation and domain folding affect the elasticity of these hydrogels due to different forces. Furthermore, as we increase the number of proteins contained in our simulation we find a smooth transition from a probabilistic to a deterministic behavior. This model is the first step toward predicting and formulating new protein-based materials, such as those needed for artificial skin and 3-D organ printing

    Investigating Protein Hydrogel Mechanics through Force-Clamp Measurement and Validation with Dynamic Modeling

    No full text
    Proteins are the workhorses of our bodies, whose specific three-dimensional structure correlates to their function within the body. Protein hydrogels are a new type of material made from an interconnected network of these proteins, which naturally embrace a variety of biomedical applications, from scaffolding for artificial tissues to controlled drug delivery systems. When these hydrogels are exposed to external forces, the protein structure unravels and extends in a process called protein unfolding, affecting the unique mechanical properties characteristic of protein hydrogels. Here, we present an experimental technique to measure the force response of protein hydrogels in conjunction with a theoretical model which considers the protein folding phenomenon to describe their measured mechanical responses. Scaling the size of the simulated hydrogel reproduces the probabilistic to deterministic behavior characteristic of single-molecule unfolding, while varying the applied force expectedly leads to increases in the total extension and associated rate constants. Using a custom-made force clamp rheometer, we probe the unfolding and extension response of protein hydrogels and compare these results with our simulations. Ultimately, this technique and model could become valuable resources for helping to design and produce biomaterials with tunable elasticity. These biomaterials will find applications in mimicking tissues and organs within the body (such as the muscle contraction of the gut and heart), with the additional ability to controllably retain and release drugs from within their structure

    Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution

    Get PDF
    Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data

    Hybrid Computational-Experimental Data-Driven Design of Self-Assembling Pi-Conjugated Peptides

    No full text
    Biocompatible molecules with electronic functionality provide a promising substrate for biocompatible electronic devices and electronic interfacing with biological systems. Synthetic oligopeptides composed of an aromatic pi-core flanked by oligopeptide wings are a class of molecules that can self-assemble in aqueous environments into supramolecular nanoaggregates with emergent optical and electronic activity. We present an integrated computational-experimental pipeline employing all-atom molecular dynamics simulations and experimental UV-visible spectroscopy within an active learning workflow using deep representational learning and Bayesian optimization to design pi-conjugated peptides programmed to self-assemble into elongated pseudo-1D nanoaggregtes with a high degree of H-type co-facial stacking of the pi-cores. We consider as our design space the 694,982 unique pi-conjugated peptides comprising a quaterthtiophene pi-core flanked by symmetric oligopeptide wings up to five amino acids in length. After sampling only 1181 molecules (~0.17% of the design space) by computation and 28 (~0.004%) by experiment, we identify and experimentally validate a diversity of previously unknown high-performing molecules and extract interpretable design rules linking peptide sequence to emergent supramolecular structure and properties
    corecore