127 research outputs found
Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell
Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes
Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences
BACKGROUND: Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment. RESULTS: We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the beta-lactamase-like superfamily of metal-dependent hydrolases. CONCLUSIONS: In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts
Small CRISPR RNAs guide antiviral defense in prokaryotes
Prokaryotes acquire virus resistance by integrating short fragments of viral nucleic acid into clusters of regularly interspaced short palindromic repeats (CRISPRs). Here we show how virus-derived sequences contained in CRISPRs are used by CRISPR-associated (Cas) proteins from the host to mediate an antiviral response that counteracts infection. After transcription of the CRISPR, a complex of Cas proteins termed Cascade cleaves a CRISPR RNA precursor in each repeat and retains the cleavage products containing the virus-derived sequence. Assisted by the helicase Cas3, these mature CRISPR RNAs then serve as small guide RNAs that enable Cascade to interfere with virus proliferation. Our results demonstrate that the formation of mature guide RNAs by the CRISPR RNA endonuclease subunit of Cascade is a mechanistic requirement for antiviral defense
Optimal data partitioning, multispecies coalescent and Bayesian concordance analyses resolve early divergences of the grape family (Vitaceae)
Evolutionary rate heterogeneity and rapid radiations are common phenomena in organismal evolution and represent major challenges for reconstructing deep-level phylogenies. Here we detected substantial conflicts in and among data sets as well as uncertainty concerning relationships among lineages of Vitaceae from individual gene trees, supernetworks and tree certainty values. Congruent deep-level relationships of Vitaceae were retrieved by comprehensive comparisons of results from optimal partitioning analyses, multispecies coalescent approaches and the Bayesian concordance method. We found that partitioning schemes selected by PartitionFinder were preferred over those by gene or by codon position, and the unpartitioned model usually performed the worst. For a data set with conflicting signals, however, the unpartitioned model outperformed models that included more partitions, demonstrating some limitations to the effectiveness of concatenation for these data. For a transcriptome data set, fast coalescent methods (STAR and MP-EST) and a Bayesian concordance approach yielded congruent topologies with trees from the concatenated analyses and previous studies. Our results highlight that well-resolved gene trees are critical for the effectiveness of coalescent-based methods. Future efforts to improve the accuracy of phylogenomic analyses should emphasize the development of newmethods that can accommodate multiple biological processes and tolerate missing data while remaining computationally tractable. (C) The Willi Hennig Society 2017.National Natural Science Foundation of China [NNSF 31500179, 31590822, 31270268]; National Basic Research Program of China [2014CB954101]; National Science Foundation [DEB0743474]; Smithsonian Scholarly Studies Grant Program and the Endowment Grant Program; CAS/SAFEA International Partnership Program for Creative Research Teams; Laboratory of Analytical Biology of the National Museum of Natural History, Smithsonian Institution; Science and Technology Basic Work [2013FY112100]info:eu-repo/semantics/publishedVersio
A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer
We introduce a Markov model for the evolution of a gene family along a
phylogeny. The model includes parameters for the rates of horizontal gene
transfer, gene duplication, and gene loss, in addition to branch lengths in the
phylogeny. The likelihood for the changes in the size of a gene family across
different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space,
where N is the number of organisms, is the height of the phylogeny, and M
is the sum of family sizes. We apply the model to the evolution of gene content
in Preoteobacteria using the gene families in the COG (Clusters of Orthologous
Groups) database
Reconciliation Revisited: Handling Multiple Optima when Reconciling with Duplication, Transfer, and Loss
Phylogenetic tree reconciliation is a powerful approach for inferring evolutionary events like gene duplication, horizontal gene transfer, and gene loss, which are fundamental to our understanding of molecular evolution. While duplication–loss (DL) reconciliation leads to a unique maximum-parsimony solution, duplication-transfer-loss (DTL) reconciliation yields a multitude of optimal solutions, making it difficult to infer the true evolutionary history of the gene family. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Here, we present an effective, efficient, and scalable method for dealing with these fundamental problems in DTL reconciliation. Our approach works by sampling the space of optimal reconciliations uniformly at random and aggregating the results. We show that even gene trees with only a few dozen genes often have millions of optimal reconciliations and present an algorithm to efficiently sample the space of optimal reconciliations uniformly at random in O(mn[superscript 2]) time per sample, where m and n denote the number of genes and species, respectively. We use these samples to understand how different optimal reconciliations vary in their node mappings and event assignments and to investigate the impact of varying event costs. We apply our method to a biological dataset of approximately 4700 gene trees from 100 taxa and observe that 93% of event assignments and 73% of mappings remain consistent across different multiple optima. Our analysis represents the first systematic investigation of the space of optimal DTL reconciliations and has many important implications for the study of gene family evolution.National Science Foundation (U.S.) (CAREER Award 0644282)National Institutes of Health (U.S.) (Grant RC2 HG005639)National Science Foundation (U.S.). Assembling the Tree of Life (Program) (Grant 0936234
Panspermia, Past and Present: Astrophysical and Biophysical Conditions for the Dissemination of Life in Space
Astronomically, there are viable mechanisms for distributing organic material
throughout the Milky Way. Biologically, the destructive effects of ultraviolet
light and cosmic rays means that the majority of organisms arrive broken and
dead on a new world. The likelihood of conventional forms of panspermia must
therefore be considered low. However, the information content of dam-aged
biological molecules might serve to seed new life (necropanspermia).Comment: Accepted for publication in Space Science Review
Evolution of regulatory signatures in primate cortical neurons at cell-type resolution
The human cerebral cortex contains many cell types that likely underwent independent functional changes during evolution. However, cell-type-specific regulatory landscapes in the cortex remain largely unexplored. Here we report epigenomic and transcriptomic analyses of the two main cortical neuronal subtypes, glutamatergic projection neurons and GABAergic interneurons, in human, chimpanzee, and rhesus macaque. Using genome-wide profiling of the H3K27ac histone modification, we identify neuron-subtype-specific regulatory elements that previously went undetected in bulk brain tissue samples. Human-specific regulatory changes are uncovered in multiple genes, including those associated with language, autism spectrum disorder, and drug addiction. We observe preferential evolutionary divergence in neuron subtype-specific regulatory elements and show that a substantial fraction of pan-neuronal regulatory elements undergoes subtype-specific evolutionary changes. This study sheds light on the interplay between regulatory evolution and cell-type-dependent gene-expression programs, and provides a resource for further exploration of human brain evolution and function
Evolution of regulatory signatures in primate cortical neurons at cell-type resolution
The human cerebral cortex contains many cell types that likely underwent independent functional changes during evolution. However, cell-type-specific regulatory landscapes in the cortex remain largely unexplored. Here we report epigenomic and transcriptomic analyses of the two main cortical neuronal subtypes, glutamatergic projection neurons and GABAergic interneurons, in human, chimpanzee, and rhesus macaque. Using genome-wide profiling of the H3K27ac histone modification, we identify neuron-subtype-specific regulatory elements that previously went undetected in bulk brain tissue samples. Human-specific regulatory changes are uncovered in multiple genes, including those associated with language, autism spectrum disorder, and drug addiction. We observe preferential evolutionary divergence in neuron subtype-specific regulatory elements and show that a substantial fraction of pan-neuronal regulatory elements undergoes subtype-specific evolutionary changes. This study sheds light on the interplay between regulatory evolution and cell-type-dependent gene-expression programs, and provides a resource for further exploration of human brain evolution and function
- …