187 research outputs found

    Evolutionary Mechanisms of Long-Term Genome Diversification Associated With Niche Partitioning in Marine Picocyanobacteria.

    Full text link
    Marine picocyanobacteria of the genera Prochlorococcus and Synechococcus are the most abundant photosynthetic organisms on Earth, an ecological success thought to be linked to the differential partitioning of distinct ecotypes into specific ecological niches. However, the underlying processes that governed the diversification of these microorganisms and the appearance of niche-related phenotypic traits are just starting to be elucidated. Here, by comparing 81 genomes, including 34 new Synechococcus, we explored the evolutionary processes that shaped the genomic diversity of picocyanobacteria. Time-calibration of a core-protein tree showed that gene gain/loss occurred at an unexpectedly low rate between the different lineages, with for instance 5.6 genes gained per million years (My) for the major Synechococcus lineage (sub-cluster 5.1), among which only 0.71/My have been fixed in the long term. Gene content comparisons revealed a number of candidates involved in nutrient adaptation, a large proportion of which are located in genomic islands shared between either closely or more distantly related strains, as identified using an original network construction approach. Interestingly, strains representative of the different ecotypes co-occurring in phosphorus-depleted waters (Synechococcus clades III, WPC1, and sub-cluster 5.3) were shown to display different adaptation strategies to this limitation. In contrast, we found few genes potentially involved in adaptation to temperature when comparing cold and warm thermotypes. Indeed, comparison of core protein sequences highlighted variants specific to cold thermotypes, notably involved in carotenoid biosynthesis and the oxidative stress response, revealing that long-term adaptation to thermal niches relies on amino acid substitutions rather than on gene content variation. Altogether, this study not only deciphers the respective roles of gene gains/losses and sequence variation but also uncovers numerous gene candidates likely involved in niche partitioning of two key members of the marine phytoplankton

    Law of Genome Evolution Direction : Coding Information Quantity Grows

    Full text link
    The problem of the directionality of genome evolution is studied. Based on the analysis of C-value paradox and the evolution of genome size we propose that the function-coding information quantity of a genome always grows in the course of evolution through sequence duplication, expansion of code, and gene transfer from outside. The function-coding information quantity of a genome consists of two parts, p-coding information quantity which encodes functional protein and n-coding information quantity which encodes other functional elements except amino acid sequence. The evidences on the evolutionary law about the function-coding information quantity are listed. The needs of function is the motive force for the expansion of coding information quantity and the information quantity expansion is the way to make functional innovation and extension for a species. So, the increase of coding information quantity of a genome is a measure of the acquired new function and it determines the directionality of genome evolution.Comment: 16 page

    Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse <it>et al. </it>in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.</p> <p>Results</p> <p>We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.</p> <p>Conclusions</p> <p>The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.</p

    Elusive Origins of the Extra Genes in Aspergillus oryzae

    Get PDF
    The genome sequence of Aspergillus oryzae revealed unexpectedly that this species has approximately 20% more genes than its congeneric species A. nidulans and A. fumigatus. Where did these extra genes come from? Here, we evaluate several possible causes of the elevated gene number. Many gene families are expanded in A. oryzae relative to A. nidulans and A. fumigatus, but we find no evidence of ancient whole-genome duplication or other segmental duplications, either in A. oryzae or in the common ancestor of the genus Aspergillus. We show that the presence of divergent pairs of paralogs is a feature peculiar to A. oryzae and is not shared with A. nidulans or A. fumigatus. In phylogenetic trees that include paralog pairs from A. oryzae, we frequently find that one of the genes in a pair from A. oryzae has the expected orthologous relationship with A. nidulans, A. fumigatus and other species in the subphylum Eurotiomycetes, whereas the other A. oryzae gene falls outside this clade but still within the Ascomycota. We identified 456 such gene pairs in A. oryzae. Further phylogenetic analysis did not however indicate a single consistent evolutionary origin for the divergent members of these pairs. Approximately one-third of them showed phylogenies that are suggestive of horizontal gene transfer (HGT) from Sordariomycete species, and these genes are closer together in the A. oryzae genome than expected by chance, but no unique Sordariomycete donor species was identifiable. The postulated HGTs from Sordariomycetes still leave the majority of extra A. oryzae genes unaccounted for. One possible explanation for our observations is that A. oryzae might have been the recipient of many separate HGT events from diverse donors

    Multichromosomal median and halving problems under different genomic distances

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances.</p> <p>Results</p> <p>We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems.</p> <p>Conclusion</p> <p>This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.</p

    The fitness cost of mis-splicing is the main determinant of alternative splicing patterns

    Get PDF
    Background Most eukaryotic genes are subject to alternative splicing (AS), which may contribute to the production of protein variants or to the regulation of gene expression via nonsense-mediated messenger RNA (mRNA) decay (NMD). However, a fraction of splice variants might correspond to spurious transcripts and the question of the relative proportion of splicing errors to functional splice variants remains highly debated. Results We propose a test to quantify the fraction of AS events corresponding to errors. This test is based on the fact that the fitness cost of splicing errors increases with the number of introns in a gene and with expression level. We analyzed the transcriptome of the intron-rich eukaryote Paramecium tetraurelia. We show that in both normal and in NMD-deficient cells, AS rates strongly decrease with increasing expression level and with increasing number of introns. This relationship is observed for AS events that are detectable by NMD as well as for those that are not, which invalidates the hypothesis of a link with the regulation of gene expression. Our results show that in genes with a median expression level, 92–98% of observed splice variants correspond to errors. We observed the same patterns in human transcriptomes and we further show that AS rates correlate with the fitness cost of splicing errors. Conclusions These observations indicate that genes under weaker selective pressure accumulate more maladaptive substitutions and are more prone to splicing errors. Thus, to a large extent, patterns of gene expression variants simply reflect the balance between selection, mutation, and drift

    Analysis of the P. lividus sea urchin genome highlights contrasting trends of genomic and regulatory evolution in deuterostomes

    Get PDF
    Sea urchins are emblematic models in developmental biology and display several characteristics that set them apart from other deuterostomes. To uncover the genomic cues that may underlie these specificities, we generated a chromosome-scale genome assembly for the sea urchin Paracentrotus lividus and an extensive gene expression and epigenetic profiles of its embryonic development. We found that, unlike vertebrates, sea urchins retained ancestral chromosomal linkages but underwent very fast intrachromosomal gene order mixing. We identified a burst of gene duplication in the echinoid lineage and showed that some of these expanded genes have been recruited in novel structures (water vascular system, Aristotle's lantern, and skeletogenic micromere lineage). Finally, we identified gene-regulatory modules conserved between sea urchins and chordates. Our results suggest that gene-regulatory networks controlling development can be conserved despite extensive gene order rearrangement

    Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    Get PDF
    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems

    Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes

    Get PDF
    The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Y′) versus log10-transformed genome size (X′, genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y′ = ln(-46.200+22.678X′, whereas non-eukaryotes a linear model, Y′ = 0.045+0.977X′, both with high significance (p<0.001, R2>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3×106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245×106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species
    corecore