952 research outputs found

    Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

    Get PDF
    Supplementary sections 1–13, tables S1–S10, and figures S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER award NSF 0644282

    Surprising complexity of the ancestral apoptosis network

    Get PDF
    A comparative genomics approach revealed that the genes for several components of the apoptosis network with single copies in vertebrates have multiple paralogs in cnidarian-bilaterian ancestors, suggesting a complex evolutionary history for this network

    Metabolic Evolution of a Deep-Branching Hyperthermophilic Chemoautotrophic Bacterium

    Get PDF
    Aquifex aeolicus is a deep-branching hyperthermophilic chemoautotrophic bacterium restricted to hydrothermal vents and hot springs. These characteristics make it an excellent model system for studying the early evolution of metabolism. Here we present the whole-genome metabolic network of this organism and examine in detail the driving forces that have shaped it. We make extensive use of phylometabolic analysis, a method we recently introduced that generates trees of metabolic phenotypes by integrating phylogenetic and metabolic constraints. We reconstruct the evolution of a range of metabolic sub-systems, including the reductive citric acid (rTCA) cycle, as well as the biosynthesis and functional roles of several amino acids and cofactors. We show that A. aeolicus uses the reconstructed ancestral pathways within many of these sub-systems, and highlight how the evolutionary interconnections between sub-systems facilitated several key innovations. Our analyses further highlight three general classes of driving forces in metabolic evolution. One is the duplication and divergence of genes for enzymes as these progress from lower to higher substrate specificity, improving the kinetics of certain sub-systems. A second is the kinetic optimization of established pathways through fusion of enzymes, or their organization into larger complexes. The third is the minimization of the ATP unit cost to synthesize biomass, improving thermodynamic efficiency. Quantifying the distribution of these classes of innovations across metabolic sub-systems and across the tree of life will allow us to assess how a tradeoff between maximizing growth rate and growth efficiency has shaped the long-term metabolic evolution of the biosphere.Comment: 25 pages, 5 figures, 5 tables, 2 supplementary file

    Genome Trees from Conservation Profiles

    Get PDF
    The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component “presence-absence” protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings

    Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

    Get PDF
    Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2

    Characterisation of Enzyme Evolution through Ancestral Enzyme Reconstruction

    Get PDF
    Through ancestral sequence reconstruction (ASR) techniques, ancient enzymes can be recreated and biochemically tested, giving insight into the enzymes’ evolutionary history. A previous study by Hobbs et al. (2012) has shown that some ancestral 3-isopropylmalate dehydrogenase (IPMDH) enzymes of the Bacillus lineage are more catalytically efficient and kinetically stable than extant counterparts. Given these characteristics, this trend raises questions as to why ancestral Bacillus IPMDH enzymes have been superseded by catalytically slower and less kinetically stable counterparts. The homology between IPMDH and the dehydrogenases of tartrate, malate and isocitrate makes IPMDH an interesting model enzyme in terms of the evolution of substrate specificity. Here, the reconstruction of a 2.7 billion year old enzyme has been attempted to extend the reconstruction of IPMDH back to the last common ancestor of the Firmicutes. This reconstruction tested the limits of ASR techniques in terms of time and levels of sequence divergence, especially for such a structurally complex enzyme. However, upon expression and purification, the enzyme was found to form an inactive, soluble aggregate. This suggests that current ASR techniques are too simplistic to reconstruct the complexity and divergence of IPMDH back as far as the last common ancestor of the Firmicutes. Enzyme evolution was investigated with ancestors from the Bacillus genus. Substrate promiscuity of ancestral enzymes was compared to a contemporary counterpart. It was concluded that the ancestral IPMDH enzymes tested do not show additional substrate promiscuity when compared to contemporary counterparts. The fitness of organisms carrying the IPMDH ancestors was assessed to establish what effects the high turnover rates and kinetic stability possessed by some ancestral IPMDH enzymes had on cells when functioning within the normal catalytic pathway for leucine. In vivo, the fastest and most kinetically stable ancestral IPMDH resulted in slower growth rates. This detrimental effect in vivo clarifies why this enzyme has been lost over evolutionary time. The X-ray crystal structure of the most recent IPMDH ancestor was also determined at 2.6 Å resolution. The structure of this ancestral IPMDH was found to be similar to other IPMDH structures, including the previously solved IPMDH from the last common ancestor of the Bacillus

    Informational Gene Phylogenies Do Not Support a Fourth Domain of Life for Nucleocytoplasmic Large DNA Viruses

    Get PDF
    Mimivirus is a nucleocytoplasmic large DNA virus (NCLDV) with a genome size (1.2 Mb) and coding capacity ( 1000 genes) comparable to that of some cellular organisms. Unlike other viruses, Mimivirus and its NCLDV relatives encode homologs of broadly conserved informational genes found in Bacteria, Archaea, and Eukaryotes, raising the possibility that they could be placed on the tree of life. A recent phylogenetic analysis of these genes showed the NCLDVs emerging as a monophyletic group branching between Eukaryotes and Archaea. These trees were interpreted as evidence for an independent “fourth domain” of life that may have contributed DNA processing genes to the ancestral eukaryote. However, the analysis of ancient evolutionary events is challenging, and tree reconstruction is susceptible to bias resulting from non-phylogenetic signals in the data. These include compositional heterogeneity and homoplasy, which can lead to the spurious grouping of compositionally-similar or fast-evolving sequences. Here, we show that these informational gene alignments contain both significant compositional heterogeneity and homoplasy, which were not adequately modelled in the original analysis. When we use more realistic evolutionary models that better fit the data, the resulting trees are unable to reject a simple null hypothesis in which these informational genes, like many other NCLDV genes, were acquired by horizontal transfer from eukaryotic hosts. Our results suggest that a fourth domain is not required to explain the available sequence data
    corecore