336 research outputs found
Communication in biological macromolecules through the lens of molecular dynamics simulations
Molecular dynamics (MD) simulations of proteins enable the study of molecular processes that drive communication in biological systems. These simulations model dynamic interactions between amino-acid residues of a protein to sample structural ensembles important for protein function. In this thesis, I applied MD simulations to a variety of protein systems to understand ligand binding, conformational change, and allosteric regulation.
The heterotetrametric N-methyl-D-aspartate receptor (NMDAR) ligand-binding domains (LBDs) bind agonists glutamate, glycine, and D-serine that are responsible for neurotransmission. Multi-microsecond MD simulations and umbrella sampling calculations of the less-studied D-serine agonist unexpectedly revealed that D-serine competes with glutamate for binding to the glutamate-binding subunit. Electrophysiological measurements indicated that D-serine is inhibitory at high concentrations, supporting a competitive inhibition mechanism.
Functional antibodies (Fabs) allosterically regulate ion-channel activity by binding to the amino-terminal domains (ATDs) of the NMDAR. MD simulation and network modelling revealed interactions stabilizing both inhibitory and potentiating Fabs; the inhibitory Fab2 stabilizes the inactive state through both direct inter-lobe contacts and long-range interaction paths extending through subunits of the ATD dimer, while the potentiating Fab5 stabilizes the active state by promoting contacts between ATD heterodimers.
In Ser65-phosphorylated ubiquitin (pUb), conformational change is required for initiating the degradation of damaged mitochondria. Using enhanced sampling methods, I sampled the transition path between the two known pUb conformers, which revealed a novel intermediate.
The FtsQLBWI complex is responsible for peptidoglycan synthesis in bacterial cell division. By simulating the E. coli FtsQLBWI complex, superfission and dominant negative variants, and introducing the bound activator FtsN, I identified important changes at subunit interfaces that provide novel structural insights into regulatory mechanisms and are consistent with experimental results.
Missing from available network-based tools used to study allostery between protein binding sites are methods that capture directional information flow. I developed a Python package that constructs directional protein networks from pairwise transfer entropies and uses these networks to identify residues that transmit allosteric signals between binding sites.
This work highlights how computational approaches probing molecular-scale dynamics complemented by experiment allow us to creatively address questions in biophysics and medicine
DEEP LEARNING METHODS FOR ANTIBODY STRUCTURE PREDICTION AND DESIGN
Antibodies are important immunological proteins, with the capacity to bind and neutralize a broad range of pathogens. The diversity of antibodies is conferred through genetic recombination and mutation, largely focused in a complementarity determining region composed of six loops. This natural diversity and binding capability has made antibodies an increasingly important therapeutic and diagnostic tool. However, despite their biological and medical significance, modeling and design of antibodies remains a challenge.
In the first half of this dissertation, I detail the development of a series of tools (DeepH3, DeepAb, and IgFold) to model increasingly complex portions of the antibody variable domain. These methods have progressively advanced the state-of-the-art in antibody modeling, first over traditional homology modeling approaches, then over highly accurate generalist methods for structure prediction. IgFold, the current-generation antibody structure prediction model, is capable of high-throughput antibody structure prediction with accuracy comparable to the best generalist methods, but in a fraction of the time. The speed and accuracy of IgFold should allow structure-based investigation on the scale of immune repertoires and accelerate the rational design of antibody therapeutics.
In the second half, I present work on generative language models for protein sequences. The first project describes ProGen2, a suite of language models trained at massive scale. I demonstrate that these models can be used to generate protein sequences resembling those produced by nature and to rank the relative fitness of protein sequences. The second project describes IgLM, a language model designed specifically for antibody design. IgLM can be used to create antibody libraries with favorable therapeutic properties or to generate full-length sequences with a specific species and chain type.
Taken together, my work has advanced our understanding of antibody structure through improved modeling, and shown how we might more effectively leverage natural antibody sequence data to achieve design of novel therapeutic molecules
Two decades of Martini:Better beads, broader scope
The Martini model, a coarse-grained force field for molecular dynamics simulations, has been around for nearly two decades. Originally developed for lipid-based systems by the groups of Marrink and Tieleman, the Martini model has over the years been extended as a community effort to the current level of a general-purpose force field. Apart from the obvious benefit of a reduction in computational cost, the popularity of the model is largely due to the systematic yet intuitive building-block approach that underlies the model, as well as the open nature of the development and its continuous validation. The easy implementation in the widely used Gromacs software suite has also been instrumental. Since its conception in 2002, the Martini model underwent a gradual refinement of the bead interactions and a widening scope of applications. In this review, we look back at this development, culminating with the release of the Martini 3 version in 2021. The power of the model is illustrated with key examples of recent important findings in biological and material sciences enabled with Martini, as well as examples from areas where coarse-grained resolution is essential, namely high-throughput applications, systems with large complexity, and simulations approaching the scale of whole cells. This article is categorized under: Software > Molecular Modeling Molecular and Statistical Mechanics > Molecular Dynamics and Monte-Carlo Methods Structure and Mechanism > Computational Materials Science Structure and Mechanism > Computational Biochemistry and Biophysics
Homology modeling in the time of collective and artificial intelligence
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.O
Advances in Molecular Simulation
Molecular simulations are commonly used in physics, chemistry, biology, material science, engineering, and even medicine. This book provides a wide range of molecular simulation methods and their applications in various fields. It reflects the power of molecular simulation as an effective research tool. We hope that the presented results can provide an impetus for further fruitful studies
In silico substrate binding profiling for SARS-COV-2 main protease (mpro) using hexapeptide substrates
COVID-19, as a disease resulting from SARS-CoV-2 infection, and a pandemic has had a devastating effect on the world. There are limited effective measures that control the spread and treatment of COVID-19 illness. The homodimeric cysteine main protease (Mpro) is crucial to the life cycle of the virus, as it cleaves the large polyproteins 1a and 1ab into matured, functional non-structural proteins. The Mpro exhibits high degrees of conservation in sequence, structure and specificity across coronavirus species, making it an ideal drug target. The Mpro substrate-binding profiles remain, despite the resolution of its recognition sequence and cleavage points (Leu-Gln↓(Ser/Ala/Gly)). In this study, a series of hexapeptide sequences containing the appropriate recognition sequence and cleavage points were generated and screened against the Mpro to study these binding profiles, and to further be the basis for efficiency-driven drug design. A multi-conformer hexapeptide substrate library comprising optimised 81000 models of 810 unique sequences was generated using RDKit within the context of python. Terminal capping with ACE and NMe was effected using SMILES and SMARTS matching. Multiple hexapeptides were complexed with chain B of crystallographic Mpro (PDS ID: 6XHM), following the validation of chain B for this purpose using AutoDock Vina at high levels of exhaustiveness (480). The resulting Vina scores ranged between -8.7 and -7.0 kcal.mol-1, and the reproducibility of best poses was validated through redocking. Ligand efficiency indices were calculated to identify substrate residues with high binding efficiency at their respective positions, revealing Val (P3), Ala (P1′); and Gly and Ala (P2′ and P3′) as leading efficient binders. Binding efficiencies were lowered by molecular weight. Substrate recognition was assessed by mapping of binding subsites, and Mpro specificity was evaluated through the resolution of intermolecular interaction at the binding interface. Molecular dynamics simulations for 20 ns were performed to assess the stability and behaviour of 132 Mpro systems complexed with KLQ*** substrates. Principal component analysis (PCA), was performed to assess II protein motions and conformational changes during the simulations. A strategy was formulated to classify and evaluate relations in the Mpro PCA motions, revealing four main clades of similarity. Similarity within a clade (Group 2) and dissimilarity between clades were confirmed. Trajectory visualisation revealed complex stability, substrate unbinding and dimer dissociation for various Mpro systems.Thesis (MSc) -- Faculty of Science, Biochemistry and Microbiology, 202
- …