28 research outputs found
Structural Analysis of Polarizing Indels Argues the Root of the Tree of Life is Near the Chloroflexi
Determining which branches of the tree of life have derived features narrows down the possible location of the root. Currently the polarization of indels done by Lake _et al_.^1-5^ and the polarizing transitions of Cavalier-Smith^6^ arrive at contradictory positions for the root of the tree. We have analyzed the sequence based indel arguments using protein structure wherever possible. Structure strongly supports some of the polarizations, but in other indels it argues for a different conclusion. We conclude that there is no contradiction between Lake _et al_. and Cavalier-Smith; the root of the tree of life must be near the Chloroflexi.

Structural analysis of polarizing indels: an emerging consensus on the root of the tree of life
BackgroundThe root of the tree of life has been a holy grail ever since Darwin first used the tree as a metaphor for evolution. New methods seek to narrow down the location of the root by excluding it from branches of the tree of life. This is done by finding traits that must be derived, and excluding the root from the taxa those traits cover. However the two most comprehensive attempts at this strategy, performed by Cavalier-Smith and Lake et al., have excluded each other's rootings.ResultsThe indel polarizations of Lake et al. rely on high quality alignments between paralogs that diverged before the last universal common ancestor (LUCA). Therefore, sequence alignment artifacts may skew their conclusions. We have reviewed their data using protein structure information where available. Several of the conclusions are quite different when viewed in the light of structure which is conserved over longer evolutionary time scales than sequence. We argue there is no polarization that excludes the root from all Gram-negatives, and that polarizations robustly exclude the root from the Archaea.ConclusionWe conclude that there is no contradiction between the polarization datasets. The combination of these datasets excludes the root from every possible position except near the Chloroflexi
Save the tree of life or get lost in the woods
Abstract Background The wealth of prokaryotic genomic data available has revealed that the histories of many genes are inconsistent, leading some to question the value of the tree of life hypothesis. It has been argued that a tree-like representation requires suppressing too much information, and that a more pluralistic approach is necessary for understanding prokaryotic evolution. We argue that trees may still be a useful representation for evolutionary histories in light of new data. Results Genomic data alone can be highly misleading when trying to resolve the tree of life. We present evidence from protein abundance data sets that genomic conservation greatly underestimates functional conservation. Function follows more of a tree-like structure than genetic material, even in the presence of horizontal transfer. We argue that the tree of cells must be incorporated into any new synthesis in order to place horizontal transfers into their proper selective context. We also discuss the role data sources other than primary sequence can play in resolving the tree of cells. Conclusions The tree of life is alive, but not well. Construction of the tree of cells has been viewed as the end goal of the study of evolution, where in reality we need to consider it more of a starting point. We propose a duality where we must consider variation of genetic material in terms of networks and selection of cellular function in terms of trees. Otherwise one gets lost in the woods of neutral evolution. Reviewers This article was reviewed by Dr. Eric Bapteste, Dr. Arcady Mushegian, and Dr. Celine Brochier
The origin of a derived superkingdom: how a gram-positive bacterium crossed the desert to become an archaeon
Abstract Background The tree of life is usually rooted between archaea and bacteria. We have previously presented three arguments that support placing the root of the tree of life in bacteria. The data have been dismissed because those who support the canonical rooting between the prokaryotic superkingdoms cannot imagine how the vast divide between the prokaryotic superkingdoms could be crossed. Results We review the evidence that archaea are derived, as well as their biggest differences with bacteria. We argue that using novel data the gap between the superkingdoms is not insurmountable. We consider whether archaea are holophyletic or paraphyletic; essential to understanding their origin. Finally, we review several hypotheses on the origins of archaea and, where possible, evaluate each hypothesis using bioinformatics tools. As a result we argue for a firmicute ancestry for archaea over proposals for an actinobacterial ancestry. Conclusion We believe a synthesis of the hypotheses of Lake, Gupta, and Cavalier-Smith is possible where a combination of antibiotic warfare and viral endosymbiosis in the bacilli led to dramatic changes in a bacterium that resulted in the birth of archaea and eukaryotes. Reviewers This article was reviewed by Patrick Forterre, Eugene Koonin, and Gáspár Jékel
Designer diatom episomes delivered by bacterial conjugation.
Eukaryotic microalgae hold great promise for the bioproduction of fuels and higher value chemicals. However, compared with model genetic organisms such as Escherichia coli and Saccharomyces cerevisiae, characterization of the complex biology and biochemistry of algae and strain improvement has been hampered by the inefficient genetic tools. To date, many algal species are transformable only via particle bombardment, and the introduced DNA is integrated randomly into the nuclear genome. Here we describe the first nuclear episomal vector for diatoms and a plasmid delivery method via conjugation from Escherichia coli to the diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana. We identify a yeast-derived sequence that enables stable episome replication in these diatoms even in the absence of antibiotic selection and show that episomes are maintained as closed circles at copy number equivalent to native chromosomes. This highly efficient genetic system facilitates high-throughput functional characterization of algal genes and accelerates molecular phytoplankton research
Evolutionary genomics of a cold-adapted diatom: Fragilariopsis cylindrus
The Southern Ocean houses a diverse and productive community of organisms1, 2. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice3, 4, 5, 6, 7. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-adapted diatom from the Southern Ocean, Fragilariopsis cylindrus8, 9, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with alleles that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean
Structure and Age Jointly Influence Rates of Protein Evolution
What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution
The Emergence and Early Evolution of Biological Carbon-Fixation
The fixation of into living matter sustains all life on Earth, and embeds the biosphere within geochemistry. The six known chemical pathways used by extant organisms for this function are recognized to have overlaps, but their evolution is incompletely understood. Here we reconstruct the complete early evolutionary history of biological carbon-fixation, relating all modern pathways to a single ancestral form. We find that innovations in carbon-fixation were the foundation for most major early divergences in the tree of life. These findings are based on a novel method that fully integrates metabolic and phylogenetic constraints. Comparing gene-profiles across the metabolic cores of deep-branching organisms and requiring that they are capable of synthesizing all their biomass components leads to the surprising conclusion that the most common form for deep-branching autotrophic carbon-fixation combines two disconnected sub-networks, each supplying carbon to distinct biomass components. One of these is a linear folate-based pathway of reduction previously only recognized as a fixation route in the complete Wood-Ljungdahl pathway, but which more generally may exclude the final step of synthesizing acetyl-CoA. Using metabolic constraints we then reconstruct a “phylometabolic” tree with a high degree of parsimony that traces the evolution of complete carbon-fixation pathways, and has a clear structure down to the root. This tree requires few instances of lateral gene transfer or convergence, and instead suggests a simple evolutionary dynamic in which all divergences have primary environmental causes. Energy optimization and oxygen toxicity are the two strongest forces of selection. The root of this tree combines the reductive citric acid cycle and the Wood-Ljungdahl pathway into a single connected network. This linked network lacks the selective optimization of modern fixation pathways but its redundancy leads to a more robust topology, making it more plausible than any modern pathway as a primitive universal ancestral form
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
