18 research outputs found
Visualising and controlling the flow in biomolecular systems at and between multiple scales:from atoms to hydrodynamics at different locations in time and space
A novel framework for modelling biomolecular systems at multiple scales in space and time simultaneously is described. The atomistic molecular dynamics representation is smoothly connected with a statistical continuum hydrodynamics description. The system behaves correctly at the limits of pure molecular dynamics (hydrodynamics) and at the intermediate regimes when the atoms move partly as atomistic particles, and at the same time follow the hydrodynamic flows. The corresponding contributions are controlled by a parameter, which is defined as an arbitrary function of space and time, thus, allowing an effective separation of the atomistic 'core' and continuum 'environment'. To fill the scale gap between the atomistic and the continuum representations our special purpose computer for molecular dynamics, MDGRAPE-4, as well as GPU-based computing were used for developing the framework. These hardware developments also include interactive molecular dynamics simulations that allow intervention of the modelling through force-feedback devices
FPGA-based range-limited molecular dynamics acceleration
Molecular Dynamics (MD) is a computer simulation technique that executes iteratively over discrete, infinitesimal time intervals. It has been a widely utilized application in the fields of material sciences and computer-aided drug design for many years, serving as a crucial benchmark in high-performance computing (HPC). Numerous MD packages have been developed and effectively accelerated using GPUs. However, as the limits of Moore's Law are reached, the performance of an individual computing node has reached its bottleneck, while the performance of multiple nodes is primarily hindered by scalability issues, particularly when dealing with small datasets.
In this thesis, the acceleration with respect to small datasets is the main focus. With the recent COVID-19 pandemic, drug discovery has gained significant attention, and Molecular Dynamics (MD) has emerged as a crucial tool in this process. Particularly, in the critical domain of drug discovery, small simulations involving approximately ~50K particles are frequently employed. However, it is important to note that small simulations do not necessarily translate to faster results, as long-term simulations comprising billions of MD iterations and more are essential in this context.
In addition to dataset size, the problem of interest is further constrained. Referred to as the most computationally demanding aspect of MD, the evaluation of range-limited (RL) forces not only accounts for 90% of the MD computation workload but also involves irregular mapping patterns of 3-D data onto 2-D processor networks. To emphasize, this thesis centers around the acceleration of RL MD specifically for small datasets.
In order to address the single-node bottleneck and multi-node scaling challenges, the thesis is organized into two progressive stages of investigation. The first stage delves extensively into enhancing single-node efficiency by examining various factors such as workload mapping from 3-D to 2-D, data routing, and data locality. The second stage focuses on studying multi-node scalability, with a particular emphasis on strong scaling, bandwidth demands, and the synchronization mechanisms between nodes.
Through our study, the results show our design on a Xilinx U280 FPGA achieves 51.72x and 4.17x speedups with respect to an Intel Xeon Gold 6226R CPU, and a Quadro RTX 8000 GPU. Our research towards strong scaling also demonstrates that 8 Xilinx U280 FPGAs connected to a switch achieves 4.67x speedup compared to an Nvidia V100 GP
Characterizing Structure and Free Energy Landscape of Proteins by NMR-guided Metadynamics
In the last two decades, a series of experimental and theoretical advances has made it possible to obtain a detailed understanding of the molecular mechanisms underlying the folding process of proteins. With the increasing power of computer technology, as well as with the improvements in force fields, atomistic simulations are also becoming increasingly important because they can generate highly detailed descriptions of the motions of proteins. A supercomputer specifically designed to integrate the Newton's equations of motion of proteins, Anton, has been recently able to break the millisecond time barrier. This achievement has allowed the direct calculation of repeated folding events for several fast-folding proteins and to characterize the molecular mechanisms underlying protein dynamics and function. However these exceptional resources are available only to few research groups in the world and moreover the observation of few event of a specific process is usually not enough to provide a statistically significant picture of the phenomenon.
In parallel, it has also been realized that by bringing together experimental measurements and computational methods it is possible to expand the range of problems that can be addressed. For example, by incorporating structural informations as structural restraints in molecular dynamics simulations it is possible to obtain structural models of these transiently populated states, as well as of native and non-native intermediates explored during the folding process. By applying this strategy to structural parameters measured by nuclear magnetic resonance (NMR) spectroscopy, one can determine the atomic-level structures and characterize the dynamics of proteins. In these approaches the experimental information is exploited to create an additional term in the force field that penalizes the deviations from the measured values, thus restraining the sampling of the conformational space to regions close to those observed experimentally.
In this thesis we propose an alternative strategy to exploit experimental information in molecular dynamics simulations. In this approach the measured parameters are not used as structural restraints in the simulations, but rather to build collective variables within metadynamics calculations. In metadynamics , the conformational sampling is enhanced by constructing a time-dependent potential that discourages the explorations of regions already visited in terms of specific functions of the atomic coordinates called collective variables. In this work we show that NMR chemical shifts can be used as collective variables to guide the sampling of conformational space in molecular dynamics simulations.
Since the method that we discuss here enables the conformational sampling to be enhanced without modifying the force field through the introduction of structural restraints, it allows estimating reliably the statistical weights corresponding to the force field used in the molecular dynamics simulations. In the present implementation we used the bias exchange metadynamics method, an enhanced sampling technique that allows reconstructing the free energy as a simultaneous function of several variables.
By using this approach, we have been able to compute the free energy landscape of two different proteins by explicit solvent molecular dynamics simulations. In the application to a well-structured globular protein, the third immunoglobulin-binding domain of streptococcal protein G (GB3), our calculation predicts the native fold as the lowest free energy minimum, identifying also the presence of an on-pathway compact intermediate with non-native topological elements. In addition, we provide a detailed atomistic picture of the structure at the folding barrier, which shares with the native state only a fraction of the secondary structure elements.
The further application to the case of the 40-residue form of Amyloid beta, allows us another remarkable achievement: the quantitative description of the free energy landscape for an intrinsically disordered protein. This kind of proteins are indeed characterized by the absence of a well-defined three-dimensional structure under native conditions and are therefore hard to investigate experimentally. We found that the free energy landscape of this peptide has approximately inverted features with respect to normal globular proteins. Indeed, the global minimum consists of highly disordered structures while higher free energy regions correspond to partially folded conformations. These structures are kinetically committed to the disordered state, but they are transiently explored even at room temperature.
This makes our findings particularly relevant since this protein is involved in the Alzheimer's disease because it is prone to aggregate in oligomers determined by the interaction of the monomer in extended beta-strand organization, toxic for the cells. Our structural
and energetic characterization allows defining a library of possible metastable states which are involved in the aggregation process.
These results have been obtained using relatively limited computational resources. The total simulation time required to reconstruct the thermodynamics of GB3 for example is about three orders of magnitude less than the typical timescale of folding of similar proteins, simulated also by Anton. We thus anticipate that the technique introduced in this thesis will allow the determination of the free energy landscapes of wide range of proteins for which NMR chemical shifts are available. Finally, since chemical shifts are the only external information used to guide the folding of the proteins, our methods can be also successfully applied to the challenging purpose of NMR structure determination, as we have demonstrated in a blind prediction test on the last CASD-NMR target
Computational Modeling of Protein Kinases: Molecular Basis for Inhibition and Catalysis
Protein kinases catalyze protein phosphorylation reactions, i.e. the transfer of the γ-phosphoryl group of ATP to tyrosine, serine and threonine residues of protein substrates. This phosphorylation plays an important role in regulating various cellular processes. Deregulation of many kinases is directly linked to cancer development and the protein kinase family is one of the most important targets in current cancer therapy regimens. This relevance to disease has stimulated intensive efforts in the biomedical research community to understand their catalytic mechanisms, discern their cellular functions, and discover inhibitors. With the advantage of being able to simultaneously define structural as well as dynamic properties for complex systems, computational studies at the atomic level has been recognized as a powerful complement to experimental studies. In this work, we employed a suite of computational and molecular simulation methods to (1) explore the catalytic mechanism of a particular protein kinase, namely, epidermal growth factor receptor (EGFR); (2) study the interaction between EGFR and one of its inhibitors, namely erlotinib (Tarceva); (3) discern the effects of molecular alterations (somatic mutations) of EGFR to differential downstream signaling response; and (4) model the interactions of a novel class of kinase inhibitors with a common ruthenium based organometallic scaffold with different protein kinases. Our simulations established some important molecular rules in operation in the contexts of inhibitor-binding, substrate-recognition, catalytic landscapes, and signaling in the EGFR tyrosine kinase. Our results also shed insights on the mechanisms of inhibition and phosphorylation commonly employed by many kinases
Recommended from our members
The Critical Assessment of Protein Dynamics using Molecular Dynamics (MD) Simulations and Nuclear Magnetic Resonance (NMR) Spectroscopy Experimentation
The biological functions of proteins often rely on structural changes and the rates at which these conformational changes occur. Studies show that regions of a protein which are known to be involved in enzyme catalysis or in contact with the substrate are identifiable by NMR spectroscopy to be more flexible, evidenced through measuring order parameters of specific bond vectors. While generalized NMR can allow for detailed characterization of the extent and time scales of these conformational fluctuations, NMR cannot easily produce the structures of sparsely populated intermediates nor can it produce explicit complex atomistic-level mechanisms needed for the full understanding of such processes. Practically, preparing a protein with appropriate isotropic enrichment to study a set of specific bond vectors experimentally is challenging as well. Oftentimes, measuring the dynamics of neighboring bond vectors are necessitated.
Detailed studies of the coupling interactions among specific residues and protein regions can be fulfilled by the use of molecular dynamics (MD) simulations. However, MD simulations rely on the ergodic hypothesis to mimic experimental conditions, requiring long simulation times. Simulations are additionally limited by the availability of accurate and reliable molecular mechanics force fields, which continue to be improved to better match experimental data. Much can also be learned from chemical theory and simulations to improve the methods in which experimental data is processed and analyzed.
The overarching goals of this thesis are to improve upon the results generated by existing methods in NMR spin relaxation spectroscopy, whether that be through: (i) improving analytical techniques of raw NMR data or through (ii) supporting experimental results with atomistically-detailed MD simulations. The majority of this work is exemplified through the protein Escherichia coli ribonuclease HI (ecRNH).
Ribonuclease HI (RNase H) is a conserved endonuclease responsible for cleaving the RNA strand of DNA/RNA hybrids in many biological processes, including reverse transcription of the viral genome in retroviral reverse transcriptases and Okazaki fragment processing during DNA replication of the lagging strand. RNase H belongs to a broader superfamily of nucleotidyl-transferases with conserved structure and mechanism, including retroviral integrases, Holliday junction resolvases, and transposases. RNase H has historically been the subject of many investigations in folding, structure, and dynamics.
In support of the first aim, we discuss new methods of obtaining more precise experimental results for order parameters and time constants for the ILV methyl groups. Deuterium relaxation rate constants are determined by the spectral density function for reorientation of the C-D bond vector at zero, single-quantum, and double-quantum 2H frequencies. We interpolate relaxation rates measured at available NMR spectrometer frequencies in order to perform a joint single/double-quantum analysis. This yields approximately 10-15% more precise estimates of model-free parameters and consequently provides a general strategy for further interpolation and extrapolation of data gathered from existing NMR spectrometers for analysis of 2H spin relaxation data in biological macromolecules.
In support of the second aim, we calculate autocorrelation functions and generalized order parameters for the ILV methyl side chain groups from MD simulation trajectories to assess the orientational motions of the side chain bond vectors. We demonstrate that motions of the side chain bond vectors can be separated into: (i) fluctuations within a given dihedral angle rotamer, (ii) jumps among the different rotamers, and (iii) motions from the protein backbone itself, through the C-alpha carbon. We are able to match order parameters of constitutive motions to conventionally calculated order parameters with an R2= 0.9962, 0.9708, and 0.9905 for Valine, Leucine, and Isoleucine residues, respectively. Some longer side chain residues such as Leucine and Isoleucine have correlated χ1 and χ2 dihedral angle rotational motions. This provides a method of evaluating the relative contributions of each constitutive motion towards the overall flexibility of a side chain. Multiple contributors of motion are possible for intermediate and low order parameters, signifying more flexible residues.
While developing protocols for MD simulations, we evaluate the effects of running 1-microsecond long simulations and compare them to solution state NMR spectroscopy. If the overall tumbling time is removed from the simulation, then analysis blocks of 5-10 times the tumbling time is optimal to eliminate contributions from slower dynamics, which would not normally be measured in solution state NMR spectroscopy. We also assess the quality of the TIP4P(-EW) water model over TIP3P; although TIP4P simulates the isotropic tumbling time well for ecRNH, internal motions are equally not affected by either water model due to well-segregated motions. Additionally, the TIP4P water model does not appear to be able to replicate an axially symmetric shape for ecRNH (ecRNH is mostly spherical and only slightly axially symmetric).
The final work of this thesis returns to the first overarching aim; we develop a specialized method that utilizes probability distribution functions to model spectral density functions. We derive the inverse Gaussian probability distribution function from general properties of spectral density functions at low and high frequencies for macromolecules in solution, using the principle of maximum entropy. The resulting model-free spectral density functions are finite at a frequency of zero and can be used to describe distributions of either overall or internal correlation times using the model-free ansatz. The approach is validated using 15N backbone relaxation data for the intrinsically disordered, DNA-binding region of the bZip transcription factor domain of the Saccharomyces cerevisiae protein GCN4, in the absence of cognate DNA
Unveiling the Molecular Mechanisms Regulating the Activation of the ErbB Family Receptors at Atomic Resolution through Molecular Modeling and Simulations
The EGFR/ErbB/HER family of kinases contains four homologous receptor tyrosine kinases that are important regulatory elements in key signaling pathways. To elucidate the atomistic mechanisms of dimerization-dependent activation in the ErbB family, we have performed molecular dynamics simulations of the intracellular kinase domains of the four members of the ErbB family (those with known kinase activity), namely EGFR, ErbB2 (HER2) and ErbB4 (HER4) as well as ErbB3 (HER3), an assumed pseudokinase, in different molecular contexts: monomer vs. dimer, wildtype vs. mutant. Using bioinformatics and fluctuation analyses of the molecular dynamics trajectories, we relate sequence similarities to correspondence of specific bond-interaction networks and collective dynamical modes. We find that in the active conformation of the ErbB kinases (except ErbB3), key subdomain motions are coordinated through conserved hydrophilic interactions: activating bond-networks consisting of hydrogen bonds and salt bridges. The inactive conformations also demonstrate conserved bonding patterns (albeit less extensive) that sequester key residues and disrupt the activating bond network. Both conformational states have distinct hydrophobic advantages through context-specific hydrophobic interactions. The inactive ErbB3 kinase domain also shows coordinated motions similar to the active conformations, in line with recent evidence that ErbB3 is a weakly active kinase, though the coordination seems to arise from hydrophobic interactions rather than hydrophilic ones. We show that the functional (activating) asymmetric kinase dimer interface forces a corresponding change in the hydrophobic and hydrophilic interactions that characterize the inactivating interaction network, resulting in motion of the αC-helix through allostery. Several of the clinically identified activating kinase mutations of EGFR act in a similar fashion to disrupt the inactivating interaction network. Our molecular dynamics study reveals the asymmetric dimer interface helps progress the ErbB family through the activation pathway using both hydrophilic and hydrophobic interaction. There is a fundamental difference in the sequence of events in EGFR activation compared with that described for the Src kinase Hck
Development of Improved Torsional Potentials in Classical Force Field Models of Poly (Lactic Acid)
In this work, existing force field descriptions of poly (lactic acid), or PLA, were improved by modifying the torsional potential energy terms to more accurately model the bond rotational behavior of PLA. Extensive calculations were carried out using density functional theory (DFT), for small PLA molecules in vacuo, and also using DFT with a continuum model to approximate the electronic structure of PLA in its condensed phase. From these results, improved force field parameters were developed using a combination of the OPLS and CHARMM force fields. The new force field, PLAFF2, is an update to the previously developed PLAFF model developed in David Bruce\u27s group, and results in more realistic conformational distributions during simulation of bulk amorphous PLA. It is demonstrated that the PLAFF2 model retains the accuracy of the original PLAFF in simulating the crystalline α polymorph of PLA. The PLAFF2 model has superior performance to any other publicly available force field for use with PLA; hence, we recommend its use in future modeling studies on the material, whether in its crystalline or amorphous form
All-Atom Modeling of Protein Folding and Aggregation
Theoretical investigations of biorelevant processes in the life-science research require highly optimized simulation methods. Therefore, massively parallel Monte Carlo algorithms, namely MTM, were successfully developed and applied to the field of reversible protein folding allowing the thermodynamic characterization of proteins on an atomistic level. Further, the formation process of trans-membrane pores in the TatA system could be elucidated and the structure of the complex could be predicted