866 research outputs found

    Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class

    Get PDF
    Only a small fraction of genes deposited to databases have been experimentally characterised. The majority of proteins have their function assigned automatically, which can result in erroneous annotations. The reliability of current annotations in public databases is largely unknown; experimental attempts to validate the accuracy within individual enzyme classes are lacking. In this study we performed an overview of functional annotations to the BRENDA enzyme database. We first applied a high-throughput experimental platform to verify functional annotations to an enzyme class of S-2-hydroxyacid oxidases (EC 1.1.3.15). We chose 122 representative sequences of the class and screened them for their predicted function. Based on the experimental results, predicted domain architecture and similarity to previously characterised S-2-hydroxyacid oxidases, we inferred that at least 78% of sequences in the enzyme class are misannotated. We experimentally confirmed four alternative activities among the misannotated sequences and showed that misannotation in the enzyme class increased over time. Finally, we performed a computational analysis of annotations to all enzyme classes in the BRENDA database, and showed that nearly 18% of all sequences are annotated to an enzyme class while sharing no similarity or domain architecture to experimentally characterised representatives. We showed that even well-studied enzyme classes of industrial relevance are affected by the problem of functional misannotation. Copyright

    The evolution of protein kinase specificity

    Get PDF
    All research conducted at EMBL-EBI under the supervision of Dr. Pedro Beltrao. Work on the PhD project was paused temporarily in the Spring of 2017 for me to undertake a 3-month internship at EMBO Press (in Heidelberg).Protein phosphorylation represents one of the most important post-translational modifica- tions (PTMs) for cell signalling, and is is catalysed by a group of enzymes called protein kinases. Through this activity they serve as key regulators of almost all cellular processes. This is achieved at any time by a network of different kinases that are transiently active. The fidelity of cell systems control therefore requires that each kinase targets only a restricted set of substrates. This specificity is achieved partly by contextual factors that separate kinases spatially and temporally, but also by sequence features that are encoded in the kinase domain itself. For this thesis I focus on elements of kinase specificity that are encoded in the the active site of the enzyme. During these investigations I have tried to address three main questions: 1) How is specificity for residues surrounding the phosphorylation site determined in the kinase? 2) How did these specificities evolve? and 3) To what extent does kinase evolution correlate with the evolution of its substrates? First, I developed a sequence-based method for the automated detection of kinase speci- ficity determining residues (SDRs). The putative determinants were then rationalised using available structural data, and in two specific cases were validated experimentally. I also used mutation data from The Cancer Genome Atlas (TCGA) to demonstrate that kinase SDRs are often targeted during cancer. Second, a global analysis of SDR evolution was performed for kinases following gene duplication and speciation, revealing that SDRs often diverge between paralogues but not between orthologues. This global analysis is followed by a detailed case study of G-protein coupled receptor kinase (GRKs) evolution using ancestral sequence reconstructions. Third, I inferred global substrate preferences in a taxonomically broad range of species using phosphoproteome data. I then related the evolution of substrate motif sequences to that of their cognate effector kinases where possible. The results strongly suggest that many of the motifs emerged in a universal eukaryotic ancestor. I finish by summarising the major findings of this doctoral research, which to my knowl- edge represents the most comprehensive analysis to date of protein kinase specificity and its evolution.BBSR

    Computational Approaches for Predicting Drug Targets

    Get PDF
    This thesis reports the development of several computational approaches to predict human disease proteins and to assess their value as drug targets, using in-house domain functional families (CATH FunFams). CATH-FunFams comprise evolutionary related protein domains with high structural and functional similarity. External resources were used to identify proteins associated with disease and their genetic variations. These were then mapped to the CATH-FunFams together with information on drugs bound to any relatives within the FunFam. A number of novel approaches were then used to predict the proteins likely to be driving disease and to assess whether drugs could be repurposed within the FunFams for targeting these putative driver proteins. The first work chapter of this thesis reports the mapping of drugs to CATHFunFams to identify druggable FunFams based on statistical overrepresentation of drug targets within the FunFam. 81 druggable CATH-FunFams were identified and the dispersion of their relatives on a human protein interaction network was analysed to assess their propensity to be associated with side effects. In the second work chapter, putative drug targets for bladder cancer were identified using a novel computational protocol that expands a set of known bladder cancer genes with genes highly expressed in bladder cancer and highly associated with known bladder cancer genes in a human protein interaction network. 35 new bladder cancer targets were identified in druggable FunFams, for some of which FDA approved drugs could be repurposed from other protein domains in the FunFam. In the final work chapter, protein kinases and kinase inhibitors were analysed. These are an important class of human drug targets. A novel classification protocol was applied to give a comprehensive classification of the kinases which was benchmarked and compared with other widely used kinase classifications. Druginformation from ChEMBL was mapped to the Kinase-FunFams and analyses of protein network characteristics of the kinase relatives in each FunFam used to identify those families likely to be associated with side effects

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Large-Scale Analysis of Protein-Ligand Binding Sites using the Binding MOAD Database.

    Full text link
    Current structure-based drug design (SBDD) methods require understanding of general tends of protein-ligand interactions. Informative descriptors of ligand-binding sites provide powerful heuristics to improve SBDD methods designed to infer function from protein structure. These descriptors must have a solid statistical foundation for assessing general trends in large sets of protein-ligand complexes. This dissertation focuses on mining the Binding MOAD database of highly curated protein-ligand complexes to determine frequently observed patterns of binding-site composition. An extension to Binding MOAD’s framework is developed to store structural details of binding sites and facilitate large-scale analysis. This thesis uses the framework to address three topics. It first describes a strategy for determining over-representation of amino acids within ligand-binding sites, comparing the trends of residue propensity for binding sites of biologically relevant ligands to those of spurious molecules with no known function. To determine the significance of these trends and to provide guidelines for residue-propensity studies, the effect of the data set size on the variation in propensity values is evaluated. Next, binding-site residue propensities are applied to improve the performance of a geometry-based, binding-site prediction algorithm. Propensity-based scores are found to perform comparably to the native score in successfully ranking correct predictions. For large proteins, propensity-based and consensus scores improve the scoring success. Finally, current protein-ligand scoring functions are evaluated using a new criterion: the ability to discern biologically relevant ligands from “opportunistic binders,” molecules present in crystal structures due to their high concentrations in the crystallization medium. Four different scoring functions are evaluated against a diverse benchmark set. All are found to perform well for ranking biologically relevant sites over spurious ones, and all performed best when penalties for torsional strain of ligands were included. The final chapter describes a structural alignment method, termed HwRMSD, which can align proteins of very low sequence homology based on their structural similarity using a weighted structure superposition. The overall aims of the dissertation are to collect high-quality binding-site composition data within the largest available set of protein-ligand complexes and to evaluate the appropriate applications of this data to emerging methods for computational proteomics.Ph.D.BioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91400/1/nickolay_1.pd

    Characterizing early drug resistance-related events using geometric ensembles from HIV protease dynamics:

    Get PDF
    The use of antiretrovirals (ARVs) has drastically improved the life quality and expectancy of HIV patients since their introduction in health care. Several millions are still afflicted worldwide by HIV and ARV resistance is a constant concern for both healthcare practitioners and patients, as while treatment options are finite, the virus constantly adapts via complex mutation patterns to select for resistant strains under the pressure of drug treatment. The HIV protease is a crucial enzyme for viral maturation and has been a game changing drug target since the first application. Due to similarities in protease inhibitor designs, drug cross-resistance is not uncommon across ARVs of the same class

    Potential application of network descriptions for understanding conformational changes and protonation states of ABC transporters.

    Get PDF
    The ABC (ATP Binding Cassette) transporter protein superfamily comprises a large number of ubiquitous and functionally versatile proteins conserved from archaea to humans. ABC transporters have a key role in many human diseases and also in the development of multidrug resistance in cancer and in parasites. Although a dramatic progress has been achieved in ABC protein studies in the last decades, we are still far from a detailed understanding of their molecular functions. Several aspects of pharmacological ABC transporter targeting also remain unclear. Here we summarize the conformational and protonation changes of ABC transporters and the potential use of this information in pharmacological design. Network related methods, which recently became useful tools to describe protein structure and dynamics, have not been applied to study allosteric coupling in ABC proteins as yet. A detailed description of the strengths and limitations of these methods is given, and their potential use in describing ABC transporter dynamics is outlined. Finally, we highlight possible future aspects of pharmacological utilization of network methods and outline the future trends of this exciting field

    The role of Dnmts and Tets in shaping the DNA methylation landscape of mouse embryonic stem cells

    Get PDF
    DNA methylation is an important epigenetic mark, which is set and maintained by DNA methyltransferases (Dnmts) and removed via passive or active mechanisms involving Ten eleven translocation enzyme (Tet) mediated oxidation. Stable cell type specific methylation patterns can only be achieved if methylation and demethylation events are in balance. Yet, the genome wide regulation of Dnmt and Tet activity is still not fully understood. The present studies use novel hairpin sequencing techniques coupled with oxidative bisul te sequencing, which permits the simultaneous and strand speci c detection of 5- methylcytosine and 5-hydroxymethylcytosine. Application of HMM models then facilitates the estimation of enzyme efficiencies for Dnmts and Tets. Furthermore, spatial modelling of hairpin bisulfite data allows the investigation of how Dnmts interpret preexisting methylation patterns. Taken together, the results of the presented studies show that methylation and hydroxylation are antagonistic, but not mutual exclusive events. In this context, the data shows that Tet efficiency is highest at open and accessible chromatin. Furthermore, the absence of Tets leads to a considerable misregulation of Dnmts, resulting in an increase in both maintenance and de novo methylation efficiency. Lastly, the spatial analysis of methylation patterns reveals that the de novo methyltransferases Dnmt3a and 3b depend in their activity on pre-existing neighbouring CpG methylation.DNA Methylierung is eine epigenetische Modifikation, welche durch DNA Methyltransferasen (Dnmts) gesetzt und beibehalten wird. Entfernt wird DNA Methylierung durch aktive oder passive Mechanismen welche die Oxidation von DNA Methylierung durch Ten- Eleven Translocation Enzyme (Tets) involviert. Stabile, Zelltyp-spezifische Methylierungsmuster können nur erreicht werden, wenn Methylierungs- und Demethylierungsvorgänge im Gleichgewicht sind. Dennoch ist die genomweite Regulation von Dnmts und Tets nicht vollständig geklärt. Die hier gezeigten Studien verwenden neue Hairpin-Sequenzierungs-Verfahren, gekoppelt mit oxidativer Bisulfit-Sequenzierung, was eine simultane und strangspezifische Analyse von 5-Methylcytosin und 5-Hydroxymethylcytosin erlaubt. Die Anwendung von hid- den Markov Modellen erlaubt im Anschluss die Berechnung von Enzymeeffizienzen für Dnmts und Tets. Darüber hinaus erlaubt eine räumliche Modellierung von Methylierungsmustern die Untersuchung, wie Dnmts bereits bestehende Methylierung interpretieren. Die Ergebnisse zeigen, dass Methylierung und Hydroxylierung antagonistische, aber keinesfalls sich ausschließende Ereignisse sind. Dabei zeigen Tets ihre stärkste Aktivität an offenem und zugänglichem Chromatin. Zudem führt der Verlust von Tets zu einer deutlichen Missregulation von Dnmts, welche sich durch eine Zunahme der Maintenance und de novo-Methylierungseffizienz äußert. Schließlich zeigt die räumliche Modellierung, dass die de novo-Methyltransferasen bei ihrer Aktivität abhängig von bereits bestehender DNA Methylierung sind

    Latent Representation and Sampling in Network: Application in Text Mining and Biology.

    Get PDF
    In classical machine learning, hand-designed features are used for learning a mapping from raw data. However, human involvement in feature design makes the process expensive. Representation learning aims to learn abstract features directly from data without direct human involvement. Raw data can be of various forms. Network is one form of data that encodes relational structure in many real-world domains. Therefore, learning abstract features for network units is an important task. In this dissertation, we propose models for incorporating temporal information given as a collection of networks from subsequent time-stamps. The primary objective of our models is to learn a better abstract feature representation of nodes and edges in an evolving network. We show that the temporal information in the abstract feature improves the performance of link prediction task substantially. Besides applying to the network data, we also employ our models to incorporate extra-sentential information in the text domain for learning better representation of sentences. We build a context network of sentences to capture extra-sentential information. This information in abstract feature representation of sentences improves various text-mining tasks substantially over a set of baseline methods. A problem with the abstract features that we learn is that they lack interpretability. In real-life applications on network data, for some tasks, it is crucial to learn interpretable features in the form of graphical structures. For this we need to mine important graphical structures along with their frequency statistics from the input dataset. However, exact algorithms for these tasks are computationally expensive, so scalable algorithms are of urgent need. To overcome this challenge, we provide efficient sampling algorithms for mining higher-order structures from network(s). We show that our sampling-based algorithms are scalable. They are also superior to a set of baseline algorithms in terms of retrieving important graphical sub-structures, and collecting their frequency statistics. Finally, we show that we can use these frequent subgraph statistics and structures as features in various real-life applications. We show one application in biology and another in security. In both cases, we show that the structures and their statistics significantly improve the performance of knowledge discovery tasks in these domains
    corecore