    A topological approach for protein classification

    Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification

    Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

    <p>Abstract</p> <p>Background</p> <p>Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.</p> <p>Results</p> <p>We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods.</p> <p>Conclusions</p> <p>By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.</p

    Computational Modeling of Protein Kinases: Molecular Basis for Inhibition and Catalysis

    Protein kinases catalyze protein phosphorylation reactions, i.e. the transfer of the γ-phosphoryl group of ATP to tyrosine, serine and threonine residues of protein substrates. This phosphorylation plays an important role in regulating various cellular processes. Deregulation of many kinases is directly linked to cancer development and the protein kinase family is one of the most important targets in current cancer therapy regimens. This relevance to disease has stimulated intensive efforts in the biomedical research community to understand their catalytic mechanisms, discern their cellular functions, and discover inhibitors. With the advantage of being able to simultaneously define structural as well as dynamic properties for complex systems, computational studies at the atomic level has been recognized as a powerful complement to experimental studies. In this work, we employed a suite of computational and molecular simulation methods to (1) explore the catalytic mechanism of a particular protein kinase, namely, epidermal growth factor receptor (EGFR); (2) study the interaction between EGFR and one of its inhibitors, namely erlotinib (Tarceva); (3) discern the effects of molecular alterations (somatic mutations) of EGFR to differential downstream signaling response; and (4) model the interactions of a novel class of kinase inhibitors with a common ruthenium based organometallic scaffold with different protein kinases. Our simulations established some important molecular rules in operation in the contexts of inhibitor-binding, substrate-recognition, catalytic landscapes, and signaling in the EGFR tyrosine kinase. Our results also shed insights on the mechanisms of inhibition and phosphorylation commonly employed by many kinases

    Mapping biophysics through enhanced Monte Carlo techniques

    This thesis is focused on the study of molecular interactions at the atomistic detail and is divided into one introductory chapter and four chapters referencing different problems and methodological approaches. All of them are focused on the development and improvement of computational Monte Carlo algorithms to study, in an efficient manner, the behavior of these systems at a classical molecular mechanics level. The four biophysical problems studied in this thesis are: induced fit docking between protein-ligand and between DNA-ligand to understand the binding mechanism, protein stretching response, and generation/ scoring of protein-protein docking poses. The thesis is organized as follows: First chapter corresponds to the state of the art in computational methods to study biophysical interactions, which is the starting point of this thesis. Our in-house PELE algorithm and the main standard methods such as molecular dynamics will be explained in detail. Chapter two is focused on the main PELE modifications to add new features, such as the addition of a new force field, implicit solvent and an anisotropic network specific for DNA simulation studies. We study, compare and validate the conformations generated by six representative DNA fragments with the new PELE features using molecular dynamics as a reference. Chapter three is devoted to applying the new methods implemented and tested in PELE to study protein-ligand interactions and DNA-ligand interactions using four systems. First, we study the porphyrin binding to Gun4 protein combining PELE and molecular dynamics simulations. Besides, we provide a docking pose that has been corroborated by a new crystal structure published during the revision process of the submitted study showing the accuracy of our predictions. In the second project, we use our improved version of PELE to generate the first structural model of an alpha glucose 1,6-bisphosphate substrate bound to the human Phosphomannomutase 2 demonstrating that this ligand can adopt two low-energy orientations. The third project is the study of DNA-ligand interactions for three cisplatin drugs where we evaluate the binding free energy using Markov state models. We show excellent results respect another free energy methods studied with molecular dynamics. The last project is the study of the daunomycin DNA intercalator where we simulate and study the binding process with PELE. Chapter four is focused on the computational study of force extension profiles during the protein unfolding. We added a dynamic harmonic constraint following a similar procedure applied in steered molecular dynamics to our Monte Carlo approach to fix or pull some selected atoms forcing the protein unfolding in a defined direction. We implement and compare with steered molecular dynamics this technique with Ubiquitin and Azurin proteins. Moreover, we add this feature to a well-known algorithm called MCPRO from William Jorgensen¿s group at YALE University to evaluate the free energy associated to the unfolding of the deca-alanine system. Chapter five corresponds to the introduction of a multiscale approach to study protein-protein docking. A coarse-grained model will be combined with a Monte Carlo exploration reducing the degrees of freedom to generate thousands of protein-protein poses in a quick way. Poses produced by this procedure will be refined and ranked through a protonation, hydrogen bond optimization, and minimization protocol at the all-atom representation to identify the best poses. I present two test cases where this procedure has been applied showing a good accuracy in the predictions: tryptogalinin and ferredoxin/flavodoxin systems.Aquesta tesi es centra en l'estudi de les interaccions moleculars amb detall atomic i es divideix en un capítol d'introducció i quatre capítols que fan referència a diferents problemes i enfocaments metodològics. Tots ells se centren en el desenvolupament i millora dels algoritmes computacionals de Monte Carlo per estudiar, de manera eficient, el comportament d'aquests sistemes a un nivell mecànica molecular clàssica. Els quatre problemes biofísics estudiats en aquesta tesi són: acoblament induït entre la proteïna-lligand i entre DNA-lligant per comprendre el mecanisme d'unió, resposta de les proteïnes a l'estirament, i la generació/puntuació d'acoblament entre poses proteïna-proteïna. La tesi s'organitza de la següent manera: El primer capítol correspon a l'estat de l'art en mètodes computacionals per estudiar les interaccions biofísiques, que és el punt de partida d'aquesta tesi. El nostre PELE algoritme i els principals mètodes estàndard com ara la dinàmica molecular s'explicaran en detall. El capítol dos es centra en les principals modificacions PELE per afegir noves característiques, com ara l'addició d'un nou camp de força, solvent implícit i modes normals per aquests estudis de simulació d'ADN. Es fa un estudi, comparació i validació de les conformacions generades per sis fragments d'ADN representatius amb PELE utilitzant dinàmica molecular com a referència. El tercer capítol està dedicat a l'aplicació dels nous mètodes implementats i provats en PELE per estudiar les interaccions proteïna-lligand i la interacció lligand-DNA utilitzant quatre sistemes. En primer lloc, se estudia la unió a proteïnes GUN4 combinant PELE i simulacions de dinàmica molecular. A més, es proposa un acoblament que ha sigut corroborat per una nova estructura cristal·lina publicada durant el procés de revisió de l'estudi mostrant l'exactitud de les nostres prediccions. En el segon projecte, hem utilitzat la nostra versió millorada de PELE per generar el primer model estructural d'una glucosa alfa substrat 1,6-bisfosfat unit a la fosfomanomutasa humana 2, que demostra que aquest lligant pot adoptar dues orientacions de baiza energia. El tercer projecte és l'estudi de les interaccions d'ADN lligant per tres medicaments cisplatí on se avalua l'energia lliure d'unió utilitzant Markov States Models. Es mostren excel·lents resultats respecte d'altres mètodes d'energia lliure estudiats amb dinàmica molecular. L'últim projecte és l'estudi de l'intercalador d'ADN anomenat daunomicina on es simula i estudia el procés d'unió amb PELE. El capítol 4 es centra en l'estudi computacional dels perfils d'extensió de la força durant el desplegament de la proteïna. Hem afegit una restricció harmònica dinàmica seguint un procediment similar al aplicat en dinàmica molecular en el nostre algoritme Monte Carlo per fixar o moure alguns àtoms seleccionats obligant a desplegar la proteïna en una direcció definida. Aquesta tècnica s'ha implementat i comparat amb dinàmica molecular per les proteïnes ubiquitina i azurin. D'altra banda, hem afegit aquesta modificació a un algoritme ben conegut anomenat MCPRO del grup de William Jorgensen a la Universitat de Yale per avaluar l'energia lliure associada al desplegament del sistema deca alanina. El capítol cinc correspon a la introducció d'un enfocament multiescala per estudiar l'acoblament proteïna-proteïna. Un model de gra gruixut es combinat amb una exploració Monte Carlo per reduir els graus de llibertat i generar milers de poses proteïna-proteïna d'una manera ràpida. Les poses produides per aquest procediment es perfeccionan i evaluan a través d'una protonació, optimització d'enllaços d'hidrogen, i minimització a escala atòmica per identificar les millors poses. Es presenten dos casos de prova on s'ha aplicat aquest procediment que mostra una bona precisió en les prediccions: tryptogalinin i ferredoxina / flavodoxina systems

    Unveiling the Molecular Mechanisms Regulating the Activation of the ErbB Family Receptors at Atomic Resolution through Molecular Modeling and Simulations

    The EGFR/ErbB/HER family of kinases contains four homologous receptor tyrosine kinases that are important regulatory elements in key signaling pathways. To elucidate the atomistic mechanisms of dimerization-dependent activation in the ErbB family, we have performed molecular dynamics simulations of the intracellular kinase domains of the four members of the ErbB family (those with known kinase activity), namely EGFR, ErbB2 (HER2) and ErbB4 (HER4) as well as ErbB3 (HER3), an assumed pseudokinase, in different molecular contexts: monomer vs. dimer, wildtype vs. mutant. Using bioinformatics and fluctuation analyses of the molecular dynamics trajectories, we relate sequence similarities to correspondence of specific bond-interaction networks and collective dynamical modes. We find that in the active conformation of the ErbB kinases (except ErbB3), key subdomain motions are coordinated through conserved hydrophilic interactions: activating bond-networks consisting of hydrogen bonds and salt bridges. The inactive conformations also demonstrate conserved bonding patterns (albeit less extensive) that sequester key residues and disrupt the activating bond network. Both conformational states have distinct hydrophobic advantages through context-specific hydrophobic interactions. The inactive ErbB3 kinase domain also shows coordinated motions similar to the active conformations, in line with recent evidence that ErbB3 is a weakly active kinase, though the coordination seems to arise from hydrophobic interactions rather than hydrophilic ones. We show that the functional (activating) asymmetric kinase dimer interface forces a corresponding change in the hydrophobic and hydrophilic interactions that characterize the inactivating interaction network, resulting in motion of the αC-helix through allostery. Several of the clinically identified activating kinase mutations of EGFR act in a similar fashion to disrupt the inactivating interaction network. Our molecular dynamics study reveals the asymmetric dimer interface helps progress the ErbB family through the activation pathway using both hydrophilic and hydrophobic interaction. There is a fundamental difference in the sequence of events in EGFR activation compared with that described for the Src kinase Hck

    The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

    Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts

    RNA and protein 3D structure modeling: similarities and differences

    In analogy to proteins, the function of RNA depends on its structure and dynamics, which are encoded in the linear sequence. While there are numerous methods for computational prediction of protein 3D structure from sequence, there have been very few such methods for RNA. This review discusses template-based and template-free approaches for macromolecular structure prediction, with special emphasis on comparison between the already tried-and-tested methods for protein structure modeling and the very recently developed “protein-like” modeling methods for RNA. We highlight analogies between many successful methods for modeling of these two types of biological macromolecules and argue that RNA 3D structure can be modeled using “protein-like” methodology. We also highlight the areas where the differences between RNA and proteins require the development of RNA-specific solutions

    Computational Methods for Conformational Sampling of Biomolecules

