2,328 research outputs found
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
The Emergence and Early Evolution of Biological Carbon-Fixation
The fixation of into living matter sustains all life on Earth, and embeds the biosphere within geochemistry. The six known chemical pathways used by extant organisms for this function are recognized to have overlaps, but their evolution is incompletely understood. Here we reconstruct the complete early evolutionary history of biological carbon-fixation, relating all modern pathways to a single ancestral form. We find that innovations in carbon-fixation were the foundation for most major early divergences in the tree of life. These findings are based on a novel method that fully integrates metabolic and phylogenetic constraints. Comparing gene-profiles across the metabolic cores of deep-branching organisms and requiring that they are capable of synthesizing all their biomass components leads to the surprising conclusion that the most common form for deep-branching autotrophic carbon-fixation combines two disconnected sub-networks, each supplying carbon to distinct biomass components. One of these is a linear folate-based pathway of reduction previously only recognized as a fixation route in the complete Wood-Ljungdahl pathway, but which more generally may exclude the final step of synthesizing acetyl-CoA. Using metabolic constraints we then reconstruct a “phylometabolic” tree with a high degree of parsimony that traces the evolution of complete carbon-fixation pathways, and has a clear structure down to the root. This tree requires few instances of lateral gene transfer or convergence, and instead suggests a simple evolutionary dynamic in which all divergences have primary environmental causes. Energy optimization and oxygen toxicity are the two strongest forces of selection. The root of this tree combines the reductive citric acid cycle and the Wood-Ljungdahl pathway into a single connected network. This linked network lacks the selective optimization of modern fixation pathways but its redundancy leads to a more robust topology, making it more plausible than any modern pathway as a primitive universal ancestral form
The compositional and evolutionary logic of metabolism
Metabolism displays striking and robust regularities in the forms of
modularity and hierarchy, whose composition may be compactly described. This
renders metabolic architecture comprehensible as a system, and suggests the
order in which layers of that system emerged. Metabolism also serves as the
foundation in other hierarchies, at least up to cellular integration including
bioenergetics and molecular replication, and trophic ecology. The
recapitulation of patterns first seen in metabolism, in these higher levels,
suggests metabolism as a source of causation or constraint on many forms of
organization in the biosphere.
We identify as modules widely reused subsets of chemicals, reactions, or
functions, each with a conserved internal structure. At the small molecule
substrate level, module boundaries are generally associated with the most
complex reaction mechanisms and the most conserved enzymes. Cofactors form a
structurally and functionally distinctive control layer over the small-molecule
substrate. Complex cofactors are often used at module boundaries of the
substrate level, while simpler ones participate in widely used reactions.
Cofactor functions thus act as "keys" that incorporate classes of organic
reactions within biochemistry.
The same modules that organize the compositional diversity of metabolism are
argued to have governed long-term evolution. Early evolution of core
metabolism, especially carbon-fixation, appears to have required few
innovations among a small number of conserved modules, to produce adaptations
to simple biogeochemical changes of environment. We demonstrate these features
of metabolism at several levels of hierarchy, beginning with the small-molecule
substrate and network architecture, continuing with cofactors and key conserved
reactions, and culminating in the aggregation of multiple diverse physical and
biochemical processes in cells.Comment: 56 pages, 28 figure
Eukaryotic large nucleo-cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome evolution
<p>Abstract</p> <p>Background</p> <p>The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) comprise an apparently monophyletic class of viruses that infect a broad variety of eukaryotic hosts. Recent progress in isolation of new viruses and genome sequencing resulted in a substantial expansion of the NCLDV diversity, resulting in additional opportunities for comparative genomic analysis, and a demand for a comprehensive classification of viral genes.</p> <p>Results</p> <p>A comprehensive comparison of the protein sequences encoded in the genomes of 45 NCLDV belonging to 6 families was performed in order to delineate cluster of orthologous viral genes. Using previously developed computational methods for orthology identification, 1445 Nucleo-Cytoplasmic Virus Orthologous Groups (NCVOGs) were identified of which 177 are represented in more than one NCLDV family. The NCVOGs were manually curated and annotated and can be used as a computational platform for functional annotation and evolutionary analysis of new NCLDV genomes. A maximum-likelihood reconstruction of the NCLDV evolution yielded a set of 47 conserved genes that were probably present in the genome of the common ancestor of this class of eukaryotic viruses. This reconstructed ancestral gene set is robust to the parameters of the reconstruction procedure and so is likely to accurately reflect the gene core of the ancestral NCLDV, indicating that this virus encoded a complex machinery of replication, expression and morphogenesis that made it relatively independent from host cell functions.</p> <p>Conclusions</p> <p>The NCVOGs are a flexible and expandable platform for genome analysis and functional annotation of newly characterized NCLDV. Evolutionary reconstructions employing NCVOGs point to complex ancestral viruses.</p
Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology
- …