64 research outputs found

    An automated approach to remote protein homology classification.

    Get PDF
    The classification of protein structures into evolutionary superfamilies, for example in the CATH or SCOP domain structure databases, although performed with varying degrees of automation, has remained a largely subjective activity guided by expert knowledge. The huge expansion of the Protein Structure Databank (PDB), partly due to the structural genomics initiatives, has posed significant challenges to maintaining the coverage of these structural classification resources. This is because the high degree of manual assessment currently involved has affected their ability to keep pace with high throughput structure determination. This thesis presents an evaluation of different methods used in remote homologue detection which was performed to identify the most powerful approaches currently available. The design and implementation of new protocols suitable for remote homologue detection was informed by an analysis of the extent to which different homologous superfamilies in CATH evolve in sequence, structure and function and characterisation of the mechanisms by which this occurs. This analysis revealed that relatives in some highly populated CATH superfamilies have diverged considerably in their structures. In diverse relatives, significant variations are observed in the secondary structure embellishments decorating the common structural core for the superfamily. There are also differences in the packing angles between secondary structures. Information on the variability observed in CATH superfamilies is collated in an established web resource the Dictionary of Homologous Superfamilies, which has been expanded and improved in a number of ways. A new structural comparison algorithm, CATHEDRAL, is described. This was developed to cope with the structural variation observed across CATH superfamilies and to improve the automatic recognition of domain boundaries in multidomain structures. CATHEDRAL combines both secondary structure matching and accurate residue alignment in an iterative protocol for determining the location of previously observed folds in novel multi-domain structures. A rigorous benchmarking protocol is also described that assesses the performance of CATHEDRAL against other leading structural comparison methods. The optimisation and benchmarking of several other methods for detecting homology are subsequently presented. These include methods which exploit Hidden Markov Models (HMMs) to detect sequence similarity and methods that attempt to assess functional similarity. Finally an automated, machine learning approach to detecting homologous relationships between proteins is presented which combines information on sequence, structure and functional similarity. This was able to identify over 85% of the homologous relationships in the CATH classification at a 5% error rate. This thesis was gratefully supported by the Biotechnology and Biological Sciences Research Council

    Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

    Get PDF
    BACKGROUND: The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. RESULTS: Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. CONCLUSION: Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes

    Multiscale Modeling of RNA Structures Using NMR Chemical Shifts

    Full text link
    Structure determination is an important step in understanding the mechanisms of functional non-coding ribonucleic acids (ncRNAs). Experimental observables in solution-state nuclear magnetic resonance (NMR) spectroscopy provide valuable information about the structural and dynamic properties of RNAs. In particular, NMR-derived chemical shifts are considered structural "fingerprints" of RNA conformational state(s). In my thesis, I have developed computational tools to model RNA structures (mainly secondary structures) using structural information extracted from NMR chemical shifts. Inspired by methods that incorporate chemical-mapping data into RNA secondary structure prediction, I have developed a framework, CS-Fold, for using assigned chemical shift data to conditionally guide secondary structure folding algorithms. First, I developed neural network classifiers, CS2BPS (Chemical Shift to Base Pairing Status), that take assigned chemical shifts as input and output the predicted base pairing status of individual residues in an RNA. Then I used the base pairing status predictions as folding restraints to guide RNA secondary structure prediction. Extensive testing indicates that from assigned NMR chemical shifts, we could accurately predict the secondary structures of RNAs and map distinct conformational states of a single RNA. Another way to utilize experimental data like NMR chemical shifts in structure modeling is probabilistic modeling, that is, using experimental data to recover native-like structure from a structural ensemble that contains a set of low energy structure models. I first developed a model, SS2CS (Secondary Structure to Chemical Shift), that takes secondary structure as input and predicts chemical shifts with high accuracies. Using Bayesian/maximum entropy (BME), I was able to reweight secondary structure models based on the agreement between the measured and reweighted ensemble-averaged chemical shifts. Results indicate that BME could identify the native or near-native structure from a set of low energy structure models as well as recover some of the non-canonical interactions in tertiary structures. We could also probe the conformational landscape by studying the weight pattern assigned by BME. Finally, I explored RNA structural annotation using assigned NMR chemical shifts. Using multitask learning, eleven structural properties were annotated by classifying individual residues in terms of each structural property. The results indicate that our method, CS-Annotate, could predict the structural properties with reasonable accuracy. We believe that CS-Annotate could be used for assessing the quality of a structure model by comparing the structure derived structural properties with the CS-Annotate derived structural properties. One major limitation of the tools developed is that they require assigned chemical shifts. And to assign chemical shifts, a secondary structure model is typically assumed. However, with the recent advances in singly labeled RNA synthesis, chemical shifts could be assigned without the assumption about the secondary structure. We envision that using the chemical shifts derived from singly labeled NMR experiments, CS-Fold could be used for modeling the secondary structure of RNA. We also believe that unassigned chemical shifts could be used for selecting structure models. Native-like structures could be recovered by comparing optimally assigned chemical shifts with computed chemical shifts (generated by SS2CS). Overall, the results presented in this thesis indicate we could extract crucial structural information of the residues in an RNA based on its NMR chemical shifts. Moreover, with the tools like CS-Fold, SS2CS, and CS-Annotate, we could accurately predict the secondary structure, model conformational landscape, and study structural properties of an RNA.PHDChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163247/1/kexin_1.pd

    Systematische Analyse der Sequenz-Struktur-Funktions ZusammenhÀnge bei Thiamindiphosphat-abhÀngigen Enzymen

    Get PDF
    Thiamine diphosphate (ThDP)-dependent enzymes form a vast and diverse protein family, both in the sequence space and in their functional potential. Of particular interest are the enantioselective C-C bond forming and cleavage reactions catalyzed by those enzymes. In these reaction, different ThDP-dependent enzymes provide distinct enantio- and chemoselectivities with often narrow substrate and product ranges. This specificity, which is beneficial for the enantiopure synthesis of fine chemicals like 2-hydroxy ketones, limits the scope of accessible products. Investigations of crystal structures of different ThDP-dependent decarboxylases revealed steric properties in the active sites of those enzymes to control the enantio- and chemoselectivity (S-pocket and donor-acceptor concept). Subsequent application of those concepts by modulation of the steric properties of enzymes’ active sites enabled rational engineering of biocatalysts with desired, but often only moderate, non-physiological enantioselectivities. The major objective of this thesis was to systematically analyze the sequences and structures of this enzyme family and to elucidate the relationships between sequence, structure and function. Detailed understanding of those relationships is pivotal for rational engineering and therefore necessary for the design of biocatalysts with desired selectivities. As compared to the enormous size of this enzyme family only a small number of representatives were experimentally characterized. Even less ThDP-dependent enzymes were modified by mutations in order to analyze effects of distinct amino acid residues and still less were structurally determined. Since the systematic analysis of the sequence-structure-function relationships requires information on the structure and function of a major fraction of family members, methods were developed and applied to increase the amount of available structure and function information. By making use of homology modeling, putative atom coordinates for enzymes lacking experimentally determined structure information were predicted. In addition, by development of a new database system that combines sequence, structure and function information, the acquisition of accurate and comparable biochemical data unambiguously linked to the biocatalysts’ amino acid sequences was enabled. Comparability of biochemical data and deduction of functional roles of certain residues requires comparable biochemical data on the one hand and methods to compare residues from different enzymes on the other hand. Introduction of standard numbering schemes for ThDP-dependent enzymes facilitated fast and accurate comparison of structurally equivalent positions without the need for structure information. The findings derived from those analyses accelerated the engineering of enzymes with desired enantio- and chemoselectivities and inter alia enabled the enzymatic, direct asymmetric synthesis of (S)-benzoins with excellent ees.Die Familie der Thiamindiphosphat (ThDP)-abhĂ€ngigen Enzyme ist gleichermaßen sequenziell als auch funktionell vielfĂ€ltig. Besonderes Interesse wird dieser Familie aufgrund ihrer FĂ€higkeit zuteil, C-C Bindungs- und Spaltungsreaktionen zu katalysieren. FĂŒr einen Einsatz in der Biokatalyse und der Synthese von Feinchemikalien (wie beispielsweise alpha-Hydroxyketone) zeichnen sie sich zudem durch ihre definierten Substratspektren als auch ihre EnantioselektivitĂ€t in zahlreichen Reaktionen aus. Allerdings schrĂ€nken diese SpezifitĂ€ten das Spektrum an enzymatisch zugĂ€nglichen Produkten ein. Vergleichende Untersuchungen vorhandener Proteinstrukturen verschiedener ThDP-abhĂ€ngiger Enzyme zeigten Unterschiede in der Form der Substrat-Bindetaschen der unterschiedlichen Vertreter. Die daraus abgeleiteten ’S-pocket’- und ’Donor/Akzeptor’-Konzepte fĂŒhren diese sterischen Unterschiede und die resultierenden verschiedenen rĂ€umlichen Anordnungen der beiden Substrate in Ligationsreaktionen als die Ursache verschiedener Enantio- und SubstratprĂ€ferenzen an. Auf dieser Grundlage konnten, durch Anpassung der Form der aktiven Taschen, Decarboxylasen mit geĂ€nderten SelektivitĂ€ten erzeugt werden. Oft allerdings einhergehend mit nur moderaten StereoselektivitĂ€ten in der Katalyse nicht-natĂŒrlicher Reaktionen. FĂŒr den Erfolg von Rationalem Design von Biokatalysatoren mit gewĂŒnschten Eigenschaften sind detaillierte Kenntnisse ĂŒber die Sequenz-Struktur-Funktions ZusammenhĂ€nge der jeweiligen Proteinfamilie von Bedeutung. Diese Doktorarbeit hatte die systematische Analyse dieser ZusammenhĂ€nge in ThDP-abhĂ€ngigen Enzymen zum Ziel. Eine systematische Analyse von Sequenz-Struktur-Funktions ZusammenhĂ€ngen erfordert implizit Sequenz-, Struktur- und Funktionsinformation fĂŒr einen Großteil der zur Familie gehörenden Enzyme. In bisherigen Arbeiten wurden - relativ zu den enormen Ausmaßen dieser Proteinfamilie - nur wenige Vertreter experimentell charakterisiert. FĂŒr weiterfĂŒhrende Untersuchungen bezĂŒglich des Einflusses bestimmter AminosĂ€ure-Positionen auf die katalytische AktivitĂ€t oder SelektivitĂ€t wurden nochmals nur wenige dieser Enzyme herangezogen. Eine experimentelle Bestimmung der Proteinstruktur, welche fĂŒr Rationales Design von Biokatalysatoren von besonderer Bedeutung ist, wurde nur fĂŒr einen noch geringeren Bruchteil der ThDP-abhĂ€ngigen Enzyme durchgefĂŒhrt. Um dem bestehenden Mangel an Informationen ĂŒber die Struktur und Funktion von Enzymen zu begegnen, wurden im Rahmen dieser Arbeit Proteinstrukturen per Homologie-Modellierung vorhergesagt und Methoden zur Erfassung und Auswertung von Funktionsdaten entwickelt. Mit Hilfe eines neuartigen Datenbank-Systems zur Erfassung verlĂ€sslicher und vergleichbarer Daten ĂŒber die Funktion und Sequenz von Enzymen, wurde die Basis fĂŒr eine systematische Analyse der genannten ZusammenhĂ€nge geschaffen. Neben der VerfĂŒgbarkeit von Funktionsinformation, eindeutig mit der Sequenz des entsprechenden Enzyms verknĂŒpft, erfordert die systematische Analyse möglicher funktioneller Bedeutungen einzelner AminosĂ€ure-Positionen eine Methode zum Vergleich von AminosĂ€uren aus verschiedenen Enzymen. Eine solche Methode wurde mit dem hier prĂ€sentierten ’standard numbering scheme’ (Standard-Nummerierungs System) zur VerfĂŒgung gestellt. Die Anwendung dieser Methode erlaubt die schnelle und akkurate Identifikation strukturell Ă€quivalenter Positionen in verschiedenen Enzymen ohne AbhĂ€ngigkeit von Strukturinformation zu den jeweils analysierten Proteinen. Die aus diesen Analysen gezogenen Erkenntnisse wurden eingesetzt, um Biokatalysatoren mit gewĂŒnschten Enantio- und ChemoselektivitĂ€ten zu erzeugen und erstmals die enzymatische, direkte asymmetrische Synthese von (S)-Benzoinen zu ermöglichen

    Structure-function studies of MICAL, the unusual multidomain flavoenzyme involved in actin cytoskeleton dynamics

    Get PDF
    MICAL (from the Molecule Interacting with CasL) indicates a family of multidomain proteins conserved from insects to humans, which are increasingly attracting attention for their participation in the control of actin cytoskeleton dynamics, and, therefore, in the several related key processes in health and disease. MICAL is unique among actin binding proteins because it catalyzes a NADPH-dependent F-actin depolymerizing reaction. This unprecedented reaction is associated with its N-terminal FAD-containing domain that is structurally related to p-hydroxybenzoate hydroxylase, the prototype of aromatic monooxygenases, but catalyzes a strong NADPH oxidase activity in the free state. This review will focus on the known structural and functional properties of MICAL forms in order to provide an overview of the arguments supporting the current hypotheses on the possible mechanism of action of MICAL in the free and F-actin bound state, on the modulating effect of the CH, LIM, and C-terminal domains that follow the catalytic flavoprotein domain on the MICAL activities, as well as that of small molecules and proteins interacting with MICAL

    A versatile Scaffold: The binding specificities of the Par3 PDZ domains mediate multiple interactions with polarity proteins

    Get PDF
    The asymmetric distribution of RNA, lipids and proteins is the basis of cell polarity. Polarized cells are vital for the organization of multicellular organism. Malfunctions in the processes generating cell polarity are linked with cancer and developmental defects. For cell polarization, the PAR complex, consisting of atypical protein kinase C, Par3 and Par6, is essential. Par3 is the central scaffold of the PAR complex. Par3 comprises of an N-terminal oligomerization domain, three Postsynaptic density protein-95, Disk large, Zonula occludens 1 (PDZ) domains, a kinase binding domain and an unstructured C-terminus. Its PDZ domains are the major protein-protein interaction domains. However, a detailed analysis of their specificities towards PDZ binding motifs (PBMs) occurring in Par3 interaction partners in the environment of cell polarity is missing. Here, I present the structural basis of the interaction of Par3 with Par6. I identified a PBM in Par6 that is essential for Par3 interaction and interacts with the PDZ1 and PDZ3, but not the PDZ2 domains in vitro. Together with my coauthors, I showed that the Par6 PBM interacts with Par3 via a canonical PDZ:PBM interaction and functions together with the Par6 PDZ domain in Par6 localization in vivo. In addition, I investigated the specificities of the individual Par3 PDZ domains for cell polarity proteins. My analysis revealed a unique binding profile for the dmPar3 PDZ1 and PDZ2 domains, while the binding profile of the dmPar3 PDZ3 domain is very promiscuous and overlaps with the specificities of the other two Par3 PDZ domains. These overlapping specificities enable Par3 to mediate multivalent interactions and thereby enable Par3 to form large protein networks with many different cell polarity proteins. In a third project, I discovered a hitherto unknown short motif N-terminal of the third PDZ domain of dmPar3, denoted FID-motif. I was able to show that the FID-motif folds back onto the dmPar3 PDZ3 domain in close vicinity of the PBM binding groove thereby reducing the affinities of the PDZ3 domain towards various PBMs in polarity proteins. These reductions in affinity prevent a subset of the previous identified PDZ3 ligands to interact with the PDZ3 domain. Hence, the FID-motif seems to fine-tune the recruitment of PBM-carrying polarity proteins via the dmPar3 PDZ3 domain. The detailed analyses presented in this thesis provide important insights into the individual roles of the Par3 PDZ domains in the assembly of polarity protein complexes. I present new clues in regard of functional redundancies within the Par3 PDZ module and provide the further evidence for Par3 acting as a central scaffold of polarity protein networks. Therefore, the function of the Par3 protein during establishment, maintenance and disruption of cell polarity during development and the related process of cancer metastasis can be understood in greater detail

    Investigating the role of protein-protein and protein-DNA interactions in the function of Isl1

    Get PDF
    LIM-homeodomain (LIM-HD) transcription factors act as key developmental regulators, through their ability to both bind DNA through homeodomain-DNA interactions, and to form larger complexes through protein-protein interactions. Many interactions that have been characterised are formed using their LIM domains, but likely also involve other regions, which have not yet been described for many LIM-HD proteins. The LIM-HD protein Isl1 has been implicated in the development of many tissues. However, relatively little detail is known about how Isl1 functions in these systems and the pathways in which it acts. The first part of this thesis aimed to identify and characterise novel binding partners for Isl1. An earlier project isolated ~180 potential binding partners through use of yeast two-hybrid mating screens; throughout this thesis further methodology was developed to identify additional proteins in a medium throughput manner. Validation protocols were then applied to determine which interactors were likely to represent biologically relevant interaction partners for Isl1. The second part of this thesis focussed on the mechanisms by which Isl1 and Lhx3 direct cell fate determination in the developing central nervous system. These proteins, along with Ldb1, interact via LIM:LID interactions to form cell-specific transcriptional complexes that target genes different to those targeted by either LIM-HD protein alone. It was not known if the homeodomains target these different sites solely because of the LIM:LID interactions or if the homeodomains themselves bind cooperatively to DNA. The DNA-binding behaviour of various iterations of the Lhx3/Isl1/Ldb1 complex are described, and structural characterisation of the Isl1/Lhx3 DNA-binding unit has been pursued. These data provide new insights into the mechanisms by which Isl1 and Lhx3 work together in regulating gene expression

    Detection and analysis of LIM domain-mediated interactions between transcription factors

    Get PDF
    LIM-homeodomain (LIM-HD) proteins are a class of transcription factors involved in tissue specification and cell determination during development and are important in adult gene regulation. Six families of LIM-HD proteins, with two close paralogues in each family, are commonly found in tetrapods. They bind DNA via HDs, whereas their interactions with other proteins are mediated mainly by a pair of closely spaced LIM-domains (LIMs) in each protein. These proteins take part in various transcriptional complexes with Ldb1 and other cofactors that contain LIM-interaction domains (LIDs). In this thesis, protein-protein interactions of LIM-HD proteins were analysed in order to better understand the molecular mechanisms of transcriptional complex formation. Based on previous research that showed LIM-LID mediated interactions between Lhx3 and Isl1, yeast two-hybrid mating arrays were used to investigate how widespread protein-protein interactions are amongst the 12 mammalian LIM-HD proteins. Due to high levels of background growth in experiments with full-length proteins in pGBT9 vectors, the mating arrays focused on LIM-domain mediated interactions with full-length LIM-HDs or known LIDs. The arrays revealed a relatively strong interaction between Lhx3 (or Lhx4) and Isl1 (or Isl2), and detected weaker interactions between Lmx1a or Lmx1b and the LIM-binding domain of Isl1. The contribution of separate LIM-domains to the overall interaction with Ldb1 for each of the proteins was analysed by the same method. In most cases one of the LIM domains in each protein was able to independently interact with the LID domain of Ldb1 by yeast two-hybrid analysis indicating a dominant binder: LIM1 in Isl1 and Isl2, or LIM2 in other proteins. The exceptions were paralogues Lhx1 and Lhx5, for which no separate domain showed interaction with Ldb1LID by this approach. All tandem LIM-domain constructs showed a much stronger interaction with Ldb1LID than any isolated LIM domain supporting the idea that both domains are required for high affinity binding to Ldb1. Bimolecular Fluorescence Complementation experiments in yeast were designed and conducted as an alternative approach to test interactions between full-length LIM-HD proteins in the hope that a non-transcription based assay would lead to no or less background signal compared to yeast two-hybrid analysis. A plasmid system was developed based on existing yeast two-hybrid vectors using split green fluorescent proteins in place of domains from the GAL4 transcription factor. The assay was able to detect interactions between different LIMs and their partners but unfortunately interactions between full-length proteins were still difficult to detect due to low fluorescence, self-complementation in the controls and localization effects. LIM domains from LIM-HD proteins cannot be used in standard bimolecular binding assays because they tend to be insoluble and/or aggregate in the absence of a binding partner. Stable, soluble intramolecular ‘tethered complexes’ can be generated in which LIMs are tethered to Ldb1LID via a flexible linker. Introduction of a specific protease site into the tether allows the formation of intermolecular cut complexes, which have previously been used in homologous competition ELISA experiments. In this thesis attempts were made to develop more robust biophysical binding assays that could be used to assess the binding affinities of different LIMs for Ldb1LID. Several different labelling approaches were used to generate proteins with fluorescent tags for use in fluorescence anisotropy assays. In one of these approaches expressed protein ligation was applied to generate proteins with an N-terminal fluorescein. Although this labelling strategy was of low efficiency for LIMs-Ldb1LID tethered constructs, some preliminary fluorescence anisotropy experiments were carried out, which indicated that this could be a useful strategy providing a more efficient labelling strategy can be found. GFP-tagged tethered complexes were easier to generate, but could not be used in anisotropy experiments because of the intrinsically high anisotropy of GFP proteins. However, preliminary experiments indicated that these proteins can be used in clear native gel shift competition assays to compare binding affinities of different tandem LIM domains to Ldb1LID. Data presented in this thesis provide valuable insight into protein-protein interactions of LIM-HD transcription factors and the advantages, as well as disadvantages, of applied experimental approaches. The results and their implications are discussed, raising questions that can be resolved in future studies
    • 

    corecore