368 research outputs found

    BBH-LS: An algorithm for computing positional homologs using sequence and gene context similarity

    Get PDF
    10.1186/1752-0509-6-S1-S22BMC Systems Biology6SUPPL.1

    Conserved gene cluster discovery and applications in comparative genomics

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Positional orthology: putting genomic evolutionary relationships into context

    Get PDF
    Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology

    The Extinction Dynamics of Bacterial Pseudogenes

    Get PDF
    Pseudogenes are usually considered to be completely neutral sequences whose evolution is shaped by random mutations and chance events. It is possible, however, for disrupted genes to generate products that are deleterious due either to the energetic costs of their transcription and translation or to the formation of toxic proteins. We found that after their initial formation, the youngest pseudogenes in Salmonella genomes have a very high likelihood of being removed by deletional processes and are eliminated too rapidly to be governed by a strictly neutral model of stochastic loss. Those few highly degraded pseudogenes that have persisted in Salmonella genomes correspond to genes with low expression levels and low connectivity in gene networks, such that their inactivation and any initial deleterious effects associated with their inactivation are buffered. Although pseudogenes have long been considered the paradigm of neutral evolution, the distribution of pseudogenes among Salmonella strains indicates that removal of many of these apparently functionless regions is attributable to positive selection

    Phylogenetic diversification of glycogen synthase kinase 3/SHAGGY-like kinase genes in plants

    Get PDF
    BACKGROUND: The glycogen synthase kinase 3 (GSK3)/SHAGGY-like kinases (GSKs) are non-receptor serine/threonine protein kinases that are involved in a variety of biological processes. In contrast to the two members of the GSK3 family in mammals, plants appear to have a much larger set of divergent GSK genes. Plant GSKs are encoded by a multigene family; analysis of the Arabidopsis genome revealed the existence of 10 GSK genes that fall into four major groups. Here we characterized the structure of Arabidopsis and rice GSK genes and conducted the first broad phylogenetic analysis of the plant GSK gene family, covering a taxonomically diverse array of algal and land plant sequences. RESULTS: We found that the structure of GSK genes is generally conserved in Arabidopsis and rice, although we documented examples of exon expansion and intron loss. Our phylogenetic analyses of 139 sequences revealed four major clades of GSK genes that correspond to the four subgroups initially recognized in Arabidopsis. ESTs from basal angiosperms were represented in all four major clades; GSK homologs from the basal angiosperm Persea americana (avocado) appeared in all four clades. Gymnosperm sequences occurred in clades I, III, and IV, and a sequence of the red alga Porphyra was sister to all green plant sequences. CONCLUSION: Our results indicate that (1) the plant-specific GSK gene lineage was established early in the history of green plants, (2) plant GSKs began to diversify prior to the origin of extant seed plants, (3) three of the four major clades of GSKs present in Arabidopsis and rice were established early in the evolutionary history of extant seed plants, and (4) diversification into four major clades (as initially reported in Arabidopsis) occurred either just prior to the origin of the angiosperms or very early in angiosperm history

    Investigating Reciprocal Control of Adherence and Motility through the Lens of PapX, a Non-structural Fimbrial Repressor of Flagellar Synthesis.

    Full text link
    Most uncomplicated urinary tract infections (UTIs) are caused by uropathogenic Escherichia coli (UPEC). Both motility and adherence are integral to UTI pathogenesis, yet they represent opposing forces. Therefore it is logical to reciprocally regulate these functions. PapX, a non-structural protein encoded by the pheV- but not pheU-associated pap operon encoding the P fimbria adherence factor of E. coli CFT073, represses flagella-mediated motility and belongs to a highly conserved family of winged-helix transcription factors. Thus, when P fimbriae are synthesized for adherence, synthesis of flagella is repressed. The mechanism of this repression, however, is not understood. papX is found preferentially in more virulent UPEC isolates, being significantly more prevalent in pyelonephritis strains (53% of isolates) than in asymptomatic bacteriuria (32%) or fecal/commensal (12.5%) strains. To examine PapX structure-function, we generated papX linker-insertion and site-directed mutants, which identified two key residues for PapX function (Lys54 and Arg127) within domains predicted by modeling with I-TASSER software to be important for dimerization and DNA binding, respectively. SELEX in conjunction with high-throughput sequencing was utilized for the first time to determine the unique binding site for the bacterial transcription factor PapX in E. coli CFT073. It was necessary to write and implement novel software for the analysis of the results from this technique. The software, TFAST, is freely available (Appendix C) and has near-perfect agreement (k = 0.89) to a gold standard in peak-finding software, MACS. Analysis of TFAST indicates that it correctly stratifies data to generate meaningful results, and successfully identified a 29 bp binding site within the flhDC promoter (TTACGGTGAGTTATTTTAACTGTGCGCAA), centered 410 bp upstream of the flhD translational start site. PapX bound the flhD promoter in gel shift experiments, which was reversible with the 29 bp sequence, indicating that PapX binds directly to this site to repress transcription of flagellar genes. Microarray, qPCR and promoter fusions indicate that PapX is not transcriptionally regulated itself. Co-precipitation studies indicate that PapX likely requires at least one cofactor for its repressive activity, and OmpA was identified as a promising candidate.PHDMicrobiology and ImmunologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111505/1/djreiss_1.pd

    Collective Dynamics Differentiates Functional Divergence in Protein Evolution

    Get PDF
    Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function

    Comparative genomics of early animal evolution

    Get PDF
    The explosion of genomics permits investigations into the origin and early evolution of the Metazoa at the molecular level. In this thesis, I am particularly interested in investigating the molecular foundation of the animal senses (i.e. how animals perceive their world). To understand the directionality of evolutionary innovation a well-developed phylogenetic framework is necessary. On one hand, the combination of molecular and morphological data sets has revolutionized our views of metazoan relationships over the past decades, but on the other hand, a number of nodes on the metazoan tree remain uncertain. Uncertainty is particularly high with reference to the taxa generally named “early branching metazoans”. Unfortunately, understanding the relationships among these taxa is key to understanding the evolution of sensory perception (Nielsen 2008). In this thesis I will investigate both animal phylogenetics (to attempt to resolve the phylogeny among the early branching Metazoa) and the evolution of the metazoan sensory receptors. The G-protein coupled receptor superfamily (GPCR) superfamily is the main family of metazoan surface receptors. In this thesis, after an initial introduction (Chapter 1), I address and substantially clarify the relationship among the early branching animals (Chapter 2) using novel genomic data and publicly available expressed sequence tags (ESTs). I then move forward (Chapter 3) to use network-based methods to study the early evolution of the GPCR superfamily in Eukaryotes and animals. Finally (Chapter 4), I focus on the study of a specific subset of GPCRs (the a-group, Rhodopsin-like receptors). This GPCR group is particularly interesting as it includes the best studied and, arguably, one of the most interesting among the GPCR families: the Opsin family. Opsins are key proteins used in the process of light detection, and the origin and early evolution of this family are still substantially unknown. Chapter 4 addresses both these problems. The thesis is then concluded by a general discussion (Chapter 5) and a future directions (Chapter 6) section. Overall, this thesis provides new insights into the origin and early evolution of the Metazoa and their senses

    Improving Comparative Genomic Studies:Definitions and Algorithms for Syntenic Blocks

    Get PDF
    Comparative genomics aims to understand the structure of genomes and the function of various genomic fragments, by transferring knowledge gained from well studied genomes, to the new object of study. Rapid and inexpensive high-throughput sequencing is making available more and more complete genome sequences. Despite the significant scientific advance, we still lack good models for the evolution of the genomic architecture, therefore analyzing these genomes presents formidable challenges. Early approaches used pairwise comparisons, but today researchers are attempting to leverage the larger potential of multiway comparisons. Current approaches are based on the identification of so called syntenic blocks: blocks of sequence that exhibit conserved features across the genomes under study. Syntenic blocks are in many ways analogous to genesâ -in many cases, the markers are used to constructing them are genes. Like genes they can exist in multiple copies, in which case we could define analogs of orthology and paralogy. However, whereas genes are studied at the sequence level, syntenic blocks are too large for that level of detail - it is their structure and function as a unit that makes them valuable for genome level comparative studies. Syntenic blocks are required for complex computations to scale to the billions of nucleotides present in many genomes; they enable comparisons across broad ranges of genomes because they filter outmuch of the individual variability; they highlight candidate regions for in-depth studies; and they facilitate whole-genome comparisons through visualization tools. The identification of such blocks is the first step in comparative studies, yet its effect on final results has not been well studied, nor has any formalization of syntenic blocks been proposed. Tools for the identification of syntenic blocks yield quite different results, thereby preventing a systematic assessment of the next steps in an analysis. Current tools do not include measurable quality objectives and thus cannot be benchmarked against themselves. Comparisons among tools have also been neglected - what few results are given use superficial measures unrelated to quality or consistency. In this thesis we address two major challenges, and present: (i) a theoretical model as well as an experimental basis for comparing syntenic blocks and thus also for improving the design of tools for the identification of syntenic blocks; (ii) a prototype model that serves as a basis for implementing effective synteny mining tools. We offer an overview of the milestones present in literature, on the development of concepts and tool related to synteny; we illustrate the application of the model and the measures by applying them to syntenic blocks produced by different contemporary tools on publicly available data sets. We have taken the first step towards a formal approach to the construction of syntenic blocks by developing a simple quality criterion based on sound evolutionary principles. Our experiments demonstrate widely divergent results among these tools, throwing into question the robustness of the basic approach in comparative genomics. Our findings highlight the need for a well founded, systematic approach to the decomposition of genomes into syntenic blocks and motivate the second part of the work - starting from the proposed model, we extend the concept with data dependent features and constraints, in order to test the concept on cases of interest
    corecore