65 research outputs found

    Improving hox protein classification across the major model organisms

    No full text
    The family of Hox-proteins has been a major focus of research for over 30 years. Hox-proteins are crucial to the correct development of bilateral organisms, however, some uncertainty remains as to which Hox-proteins are functionally equivalent across different species. Initial classification of Hox-proteins was based on phylogenetic analysis of the 60 amino acid homeodomain. This approach was successful in classifying Hox-proteins with differing homeodomains, but the relationships of Hox-proteins with nearly identical homeodomains, yet distinct biological functions, could not be resolved. Correspondingly, these 'problematic' proteins were classified into one large unresolved group. Other classifications used the relative location of the Hox-protein coding genes on the chromosome (synteny) to further resolve this group. Although widely used, this synteny-based classification is inconsistent with experimental evidence from functional equivalence studies. These inconsistencies led us to re-examine and derive a new classification for the Hox-protein family using all Hox-protein sequences available in the GenBank non-redundant protein database (NCBI-nr). We compare the use of the homeodomain, the homeodomain with conserved flanking regions (the YPWM and linker region), and full length Hox-protein sequences as a basis for classification of Hox-proteins. In contrast to previous attempts, our approach is able to resolve the relationships for the 'problematic' as well as ABD-B-like Hox-proteins. We highlight differences to previous classifications and clarify the relationships of Hox-proteins across the five major model organisms, Caenorhabditis elegans, Drosophila melanogaster, Branchiostoma floridae, Mus musculus and Danio rerio. Comparative and functional analysis of Hox-proteins, two fields crucial to understanding the development of bilateral organisms, have been hampered by difficulties in predicting functionally equivalent Hox-proteins across species. Our classification scheme offers a higher-resolution classification that is in accordance with phylogenetic as well as experimental data and, thereby, provides a novel basis for experiments, such as comparative and functional analyses of Hox-proteins.Funding for this work has been provided by the Australian Research Council, Center for Excellence Grant (CEO348212)

    Bioinformatic analysis of the CLE signaling peptide family

    Get PDF
    Background. Plants encode a large number of leucine-rich repeat receptor-like kinases. Legumes encode several LRR-RLK linked to the process of root nodule formation, the ligands of which are unknown. To identify ligands for these receptors, we used a combination of profile hidden Markov models and position-specific iterative BLAST, allowing us to detect new members of the CLV3/ESR (CLE) protein family from publicly available sequence databases. Results. We identified 114 new members of the CLE protein family from various plant species, as well as five protein sequences containing multiple CLE domains. We were able to cluster the CLE domain proteins into 13 distinct groups based on their pairwise similarities in the primary CLE motif. In addition, we identified secondary motifs that coincide with our sequence clusters. The groupings based on the CLE motifs correlate with known biological functions of CLE signaling peptides and are analogous to groupings based on phylogenetic analysis and ectopic overexpression studies. We tested the biological function of two of the predicted CLE signaling peptides in the legume Medicago truncatula. These peptides inhibit the activity of the root apical and lateral root meristems in a manner consistent with our functional predictions based on other CLE signaling peptides clustering in the same groups. Conclusion. Our analysis provides an identification and classification of a large number of novel potential CLE signaling peptides. The additional motifs we found could lead to future discovery of recognition sites for processing peptidases as well as predictions for receptor binding specificity

    LATERAL BRANCHING OXIDOREDUCTASE acts in the final stages of strigolactone biosynthesis inArabidopsis

    Get PDF
    Strigolactones are a group of plant compounds of diverse but related chemical structures. They have similar bioactivity across a broad range of plant species, act to optimize plant growth and development, and promote soil microbe interactions. Carlactone, a common precursor to strigolactones, is produced by conserved enzymes found in a number of diverse species. Versions of the MORE AXILLARY GROWTH1 (MAX1) cytochrome P450 from rice and Arabidopsis thaliana make specific subsets of strigolactones from carlactone. However, the diversity of natural strigolactones suggests that additional enzymes are involved and remain to be discovered. Here, we use an innovative method that has revealed a missing enzyme involved in strigolactone metabolism. By using a transcriptomics approach involving a range of treatments that modify strigolactone biosynthesis gene expression coupled with reverse genetics, we identified LATERAL BRANCHING OXIDOREDUCTASE (LBO), a gene encoding an oxidoreductase-like enzyme of the 2-oxoglutarate and Fe(II)-dependent dioxygenase superfamily. Arabidopsis lbo mutants exhibited increased shoot branching, but the lbo mutation did not enhance the max mutant phenotype. Grafting indicated that LBO is required for a graft-transmissible signal that, in turn, requires a product of MAX1. Mutant lbo backgrounds showed reduced responses to carlactone, the substrate of MAX1, and methyl carlactonoate (MeCLA), a product downstream of MAX1. Furthermore, lbo mutants contained increased amounts of these compounds, and the LBO protein specifically converts MeCLA to an unidentified strigolactone-like compound. Thus, LBO function may be important in the later steps of strigolactone biosynthesis to inhibit shoot branching in Arabidopsis and other seed plants

    Biologisch relevante Klassifizierung von Proteinsequenzen - ein bioinformatischer Ansatz

    No full text
    Life without proteins is hardly imaginable. Proteins are essential to most structural components and metabolic processes within cells and replication of genetic material would not be possible, were they missing. The Genome of each organism contains information about all proteins that organism is capable of synthesizing. As proteins are such a central component of life, it is essential to gain a greater unterstanding of the various proteins and their interaction partners, prior to being able to understand Organisms at a molecular resolution. Experimental characterization of all proteins in all organisms is unfeasable due to time and financial constraints. However, it is frequently possible to glean knowledge for a large number of proteins in each new genome by transferring information from close sequence relatives wich have been characterized. The idea being, that proteins similar at the sequence level will most likey also have retained a similar structure and function. Some of the experimentally determined characteristics of one protein can therefore be transferred to all related proteins, depending on the degree of relatedness. Protein classification deals with determining the degree to which proteins are related and which functional and structural characteristics are conserved. In this work I describe the basics of protein classification: sequence similarity searches, sequence alignment and phylogenetic inference. Various methods are described and the advantages and disadvantages of one approach over the other mentioned. In addition, the most frequent protein classification problems and ways to circumvent these are presented. PhyloGenie and CLANS describe two different approaches to protein classification. Phylogenie focuses on the analysis of the set of all trees derived from the proteome of an organism: the phylome. To compare the performance of phylogenie to alternative methods, we repeated the analysis of two datasets searching for: 1) the amount of lateral gene transfer between Thermoplasma and Sulfolobus (Ruepp et al. 2000) and 2) genes supporting the hypothesis of an actinopterygian specific genome duplication (Taylor et al. 2003). Our analysis of the Thermoplasma acidophilum dataset pointed to large numbers of genes having been transferred between Thermoplasma and distantly related archaebacteria of the genus Sulfolobus. Comparison with other methods of detecting lateral gene transfer showed PhyloGenie to provide the best sensitivity to specificity quotient of the tested methods. Using Phylogenie in a comparative genomics analysis of the incomplete Dario rerio genome, we were able to double the number of orthologous genes supporting the actinopterygian specific genome duplication hypothesis. In contrast to PhyloGenie, which works in a mostly organism-specific manner, CLANS is used to analyze protein families. Protein families are used to describe the set of sequences descendant from an ancestral protein, some of which may have greatly changed over time. Larger families may contain orthologous and paralogous subgroups and encompass many thousands of sequences, rendering phylogenetic approaches computationally prohibitive and difficult to analyze. CLANS relies on graphical representation of all pairwise sequence similarities. This permits analysis of much larger datasets and is less sensitive to many of the problems traditional phylogenetic methods face. Application of CLANS to the group of AAA-ATPases enabled us to describe this family in an objective manner for the first time. Previous analyses differed in number and types of sequences used, so that enumeration and classification of all AAA-ATPases in the NCBI nonredundant protein database was a primary goal. The results generated were biologically plausible and surprising insights, such as the apparent homology of N-domains of distantly related AAA-ATPases, could be corroborated by additional tests. Due to it's ability to rapidly analyze large numbers of unaligned sequences, CLANS became the basis for a number of further analyses. Published examples include a description of the TAA43 protein (Santos et al. 2004), the Wipi-1-alpha beta-propeller (Proikas-Czesanne et al. 2004) as well as a correction of the structure of the AbrB transcription factor (Coles et al. in press).Das Leben wäre ohne Proteine unvorstellbar. Die meisten strukturellen Komponenten des Lebens bestehen aus Proteinen, die meisten metabolischen Reaktionen werden durch Proteine begünstigt und selbst die Vervielfältigung des Erbguts würde ohne Proteine nicht stattfinden. Das Erbgut enthält, in verschlüsselter Form, Informationen über alle Proteine die ein Lebewesen herstellen kann. Will man auf molekularem Niveau Lebewesen verstehen, so ist ein genaues Verständnis der verschiedenen metabolischen und regulatorischen Proteine, sowie deren Interaktionspartner, notwendig. Allerdings ist die experimentelle Beschreibung aller Proteine in allen Organismen sowohl zeitlich als auch finanziell nicht möglich. Um dennoch eine Charakterisierung des Grossteils der Proteine eines Organismus zu ermöglichen macht man sich zunutze, dass verwandte Proteine meist auch ähnliche Struktur und Funktion haben. Ermittelte Charakteristika können somit auf verwandte Proteine übertragen werden. Proteinklassifizierung beschäftigt sich damit, den Verwandtschaftgrad ebenso wie funktionelle und strukturelle Gemeinsamkeiten verschiedener Proteine zu ermitteln. In dieser Arbeit gehe ich kurz in die Grundlagen der Proteinklassifizierung ein: Sequenzähnlichkeitssuche, Sequenz-alignierung und Stammbaum-erstellung. Die Methoden, ebenso wie ihre Vor- und Nachteile, werden kurz beschrieben und Lösungsansätze für die häufigsten Fehler und Probleme dargelegt. Die vorgestellten Arbeiten beschreiben zwei unterschiedliche Ansätze zur Klassifizierung von Proteinen, PhyloGenie und CLANS. "PhyloGenie" beschäftigt sich mit der Erstellung und Analyse von Phylomen, der Menge aller Gen-Stammbäume für das jeweilige Proteom eines Organismus. Um abzuschätzen wie gut PhyloGenie im Verhältnis zu alternativen Methoden abschneidet, haben wir zwei Datensätze erneut untersucht: a) Die Menge an lateralem Gen-transfer zwischen Thermoplasma und Sulfolobus (Ruepp et al. 2000) und die Suche nach Genen die die Strahlenflosser spezifische Genomduplikation unterstützen (Taylor et al. 2003). Unsere Analyse des Thermoplasma acidophilum Phyloms deutet auf wiederholte Austausche grösserer Bereiche genetischen Materials mit entfernt verwandten Archaebakterien der Familie Sulfolobus hin. Ein Vergleich mit anderen Ansätzen lateralen Gen-transfer aufzudecken zeigt, dass PhyloGenie das vorteilhafteste Verhältnis von Sensitivität zu Spezifität aller untersuchten Methoden erreicht. Eine vergleichende Genomanalyse des unvollständigen Danio rerio Genoms zeigt eine weitere Applikation Phylom basierter Analysemethoden. Durch Anwendung von PhyloGenie auf die Fragestellung der Strahlenflosser spezifischen Genomduplikation, konnte die Anzahl an Gruppen orthologer Gene verdoppelt werden, die diese Theorie unterstützen. Im Gegensatz zu PhyloGenie, welches Organismus-spezifisch arbeitet, behandelt CLANS die Analyse ganzer Proteinfamilien. Eine Proteinfamilie umfasst alle von einem Ur-Protein abstammenden Kopien, die sich im Laufe der Zeit zum Teil stark verändert haben können. Grössere Familien können paraloge und orthologe Untergruppen beinhalten und umfassen oft mehrere tausend Proteine, wodurch Stammbaumanalysen enorm Zeitaufwendig und schlecht überschaubar werden. Der Ansatz von CLANS beruht auf grafischer Darstellung aller paarweisen Sequenzähnlichkeiten. Dies ermöglicht die Analyse erheblich grösserer Datenmengen und ist unempfindlich gegenüber vielen Problemen der traditionellen Stammbaumerstellung. Anwendung von CLANS auf die Gruppe der AAA-ATPasen ermöglichte zum ersten Mal eine objektive Beschreibung dieser Familie. Existierende Klassifikationen dieser Familie unterscheiden sich zum Teil erheblich in der Anzahl vorhandener Sequenzen, so dass ein Hauptaspekt dieser Arbeit die Enumerierung aller AAA-ATPasen in der nichtredundanten NCBI Proteindatenbank und Beschreibung der Verwandschatsbeziehungen der einzelnen AAA-subfamilien ist. Die Ergebnisse der AAA-analyse sind biologisch nachvollziehbar und überraschende Vorhersagen, zum Beispiel die Homologie einiger N-Domänen entfernt verwandter AAA-ATPasen, wurden durch zusätzliche Untersuchungen verifiziert. Die Möglichkeit mit CLANS grosse Mengen an unalignierten Sequenzen zu untersuchen hat dazu geführt, dass es zur Grundlage vieler weiterer Analysen wurde. Als publizierte Beispiele sind hierfür die Analyse des TAA43 Proteins (Santos et al. 2004), eine Beschreibung des Wipi-1-alpha beta-propeller Proteins (Proikas-Czesanne et al. 2004) sowie eine Korrektur der Struktur des AbrB Transkriptionfaktors (Coles et al. in press) anzuführen

    Biologically meaningful classification of protein sequences - a bioinformatic approach

    No full text

    Analyzing microarray data using CLANS

    No full text
    Summary: Analysis of microarray experiments is complicated by the huge amount of data involved. Searching for groups of co-expressed genes is akin to searching for protein families in a database as, in both cases, small subsets of genes with similar features are to be found within vast quantities of data. CLANS was originally developed to find protein families in large sets of amino acid sequences where the amount of data involved made phylogenetic approaches overly cumbersome. We present a number of improvements that greatly extend the previous version of CLANS and show its application to microarray data as well as its ability of incorporating additional information to facilitate interactive analysis

    Phylogenetic analysis of the triterpene cyclase protein family in prokaryotes and eukaryotes suggests bidirectional lateral gene transfer

    No full text
    Functional constraints to modifications in triterpene cyclase amino acid sequences make them good candidates for evolutionary studies on the phylogenetic relatedness of these enzymes in prokaryotes as well as in eukaryotes. In this study, we used a set of identified triterpene cyclases, a group of mainly bacterial squalene cyclases and a group of predominantly eukaryotic oxidosqualene cyclases, as seed sequences to identify 5288 putative triterpene cyclase homologues in publicly available databases. The Cluster Analysis of Sequences software was used to detect groups of sequences with increased pairwise sequence similarity. The sequences fall into two main clusters, a bacterial and a eukaryotic. The conserved, informative regions of a multiple sequence alignment of the family were used to construct a neighbour-joining phylogenetic tree using the AsaturA and maximum likelihood phylogenetic tree using the PhyML software. Both analyses showed that most of the triterpene cyclase sequences were similarly grouped to the accepted taxonomic relationships of the organism the sequences originated from, supporting the idea of vertical transfer of cyclase genes from parent to offspring as the main evolutionary driving force in this protein family. However, a small group of sequences from three bacterial species (Stigmatella, Gemmata and Methylococcus) grouped with an otherwise purely eukaryotic cluster of oxidosqualene cyclases, while a small group of sequences from seven fungal species and a sequence from the fern Adiantum grouped consistently with a cluster of otherwise purely bacterial squalene cyclases. This suggests that lateral gene transfer may have taken place, entailing a transfer of oxidosqualene cyclases from eukaryotes to bacteria and a transfer of squalene cyclase from bacteria to an ancestor of the group of Pezizomycotina fungi
    corecore