2,496 research outputs found

    Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications

    Get PDF
    One of the objectives of evolutionary genomics is to reveal the genetic information contained in the primordial genome (called the primary genetic information in this paper, with the primordial genome defined here as the most primitive nucleic acid genome for earth’s life) by searching for primitive traits or relics remained in modern genomes. As the shorter a sequence is, the less probable it would be modified during genome evolution. For that reason, some characteristics of very short nucleotide sequences would have considerable chances to persist during billions of years of evolution. Consequently, conservation of certain genomic features of mononucleotides, dinucleotides, and higher-order oligonucleotides across various genomes may exist; some, if not all, of these features would be relics of the primary genetic information. Based on this assumption, we analyzed the pattern of frequencies of mononucleotides, dinucleotides, and higher-order oligonucleotides of the whole-genome sequences from 458 species (including archaea, bacteria, and eukaryotes). Also, we studied the phenomenon of strand symmetry in these genomes. The results show that the conservation of frequencies of some dinucleotides and higher-order oligonucleotides across genomes does exist, and that strand symmetry is a ubiquitous and explicit phenomenon that may contribute to frequency conservation. We propose a new hypothesis for the origin of strand symmetry and frequency conservation as well as for the constitution of early genomes. We conclude that the phenomena of strand symmetry and the pattern of frequency conservation would be original features of the primary genetic information

    A Primer on Metagenomics

    Get PDF
    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics

    Genotype–phenotype correlations within the Geodermatophilaceae

    Get PDF
    The integration of genomic information into microbial systematics along with physiological and chemotaxonomic parameters provides for a reliable classification of prokaryotes. In silico analysis of chemotaxonomic traits is now being introduced to replace characteristics traditionally determined in the laboratory with the dual goal of both increasing the speed of the description of taxa and the accuracy and consistency of taxonomic reports. Genomics has already successfully been applied in the taxonomic rearrangement of Geodermatophilaceae (Actinomycetota) but in the light of new genomic data the taxonomy of the family needs to be revisited. In conjunction with the taxonomic characterisation of four strains phylogenetically located within the family, we conducted a phylogenetic analysis of the whole proteomes of the sequenced type strains and established genotype–phenotype correlations for traits related to chemotaxonomy, cell morphology and metabolism. Results indicated that the four isolates under study represent four novel species within the genus Blastococcus. Additionally, the genera Blastococcus, Geodermatophilus and Modestobacter were shown to be paraphyletic. Consequently, the new genera Trujillonella, Pleomorpha and Goekera were proposed within the Geodermatophilaceae and Blastococcus endophyticus was reclassified as Trujillonella endophytica comb. nov., Geodermatophilus daqingensis as Pleomorpha daqingensis comb. nov. and Modestobacter deserti as Goekera deserti comb. nov. Accordingly, we also proposed emended descriptions of Blastococcus aggregatus, Blastococcus jejuensis, Blastococcus saxobsidens and Blastococcus xanthilyniticus. In silico chemotaxonomic results were overall consistent with wet-lab results. Even though in silico discriminatory levels varied depending on the respective chemotaxonomic trait, this approach is promising for effectively replacing and/or complementing chemotaxonomic analyses at taxonomic ranks above the species level. Finally, interesting but previously overlooked insights regarding morphology and ecology were revealed by the presence of a repertoire of genes related to flagellum synthesis, chemotaxis, spore production and pilus assembly in all representatives of the family. A rich carbon metabolism including four different CO2 fixation pathways and a battery of enzymes able to degrade complex carbohydrates were also identified in Blastococcus genomes

    Transcriptome Complexities Across Eukaryotes

    Full text link
    Genomic complexity is a growing field of evolution, with case studies for comparative evolutionary analyses in model and emerging non-model systems. Understanding complexity and the functional components of the genome is an untapped wealth of knowledge ripe for exploration. With the "remarkable lack of correspondence" between genome size and complexity, there needs to be a way to quantify complexity across organisms. In this study we use a set of complexity metrics that allow for evaluation of changes in complexity using TranD. We ascertain if complexity is increasing or decreasing across transcriptomes and at what structural level, as complexity is varied. We define three metrics -- TpG, EpT, and EpG in this study to quantify the complexity of the transcriptome that encapsulate the dynamics of alternative splicing. Here we compare complexity metrics across 1) whole genome annotations, 2) a filtered subset of orthologs, and 3) novel genes to elucidate the impacts of ortholog and novel genes in transcriptome analysis. We also derive a metric from Hong et al., 2006, Effective Exon Number (EEN), to compare the distribution of exon sizes within transcripts against random expectations of uniform exon placement. EEN accounts for differences in exon size, which is important because novel genes differences in complexity for orthologs and whole transcriptome analyses are biased towards low complexity genes with few exons and few alternative transcripts. With our metric analyses, we are able to implement changes in complexity across diverse lineages with greater precision and accuracy than previous cross-species comparisons under ortholog conditioning. These analyses represent a step forward toward whole transcriptome analysis in the emerging field of non-model evolutionary genomics, with key insights for evolutionary inference of complexity changes on deep timescales across the tree of life. We suggest a means to quantify biases generated in ortholog calling and correct complexity analysis for lineage-specific effects. With these metrics, we directly assay the quantitative properties of newly formed lineage-specific genes as they lower complexity in transcriptomes.Comment: 33 pages main text; 6 main figures; 25 pages of supplement; 1 supplementary table; 24 Supp Figures; 58 pages tota

    LTR Retrotransposons Contribute to Genomic Gigantism in Plethodontid Salamanders

    Get PDF
    Among vertebrates, most of the largest genomes are found within the salamanders, a clade of amphibians that includes 613 species. Salamander genome sizes range from ∼14 to ∼120 Gb. Because genome size is correlated with nucleus and cell sizes, as well as other traits, morphological evolution in salamanders has been profoundly affected by genomic gigantism. However, the molecular mechanisms driving genomic expansion in this clade remain largely unknown. Here, we present the first comparative analysis of transposable element (TE) content in salamanders. Using high-throughput sequencing, we generated genomic shotgun data for six species from the Plethodontidae, the largest family of salamanders. We then developed a pipeline to mine TE sequences from shotgun data in taxa with limited genomic resources, such as salamanders. Our summaries of overall TE abundance and diversity for each species demonstrate that TEs make up a substantial portion of salamander genomes, and that all of the major known types of TEs are represented in salamanders. The most abundant TE superfamilies found in the genomes of our six focal species are similar, despite substantial variation in genome size. However, our results demonstrate a major difference between salamanders and other vertebrates: salamander genomes contain much larger amounts of long terminal repeat (LTR) retrotransposons, primarily Ty3/gypsy elements. Thus, the extreme increase in genome size that occurred in salamanders was likely accompanied by a shift in TE landscape. These results suggest that increased proliferation of LTR retrotransposons was a major molecular mechanism contributing to genomic expansion in salamanders

    Bridging the gap between omics and earth system science to better understand how environmental change impacts marine microbes

    Get PDF
    The advent of genomic-, transcriptomic- and proteomic-based approaches has revolutionized our ability to describe marine microbial communities, including biogeography, metabolic potential and diversity, mechanisms of adaptation, and phylogeny and evolutionary history. New interdisciplinary approaches are needed to move from this descriptive level to improved quantitative, process-level understanding of the roles of marine microbes in biogeochemical cycles and of the impact of environmental change on the marine microbial ecosystem. Linking studies at levels from the genome to the organism, to ecological strategies and organism and ecosystem response, requires new modelling approaches. Key to this will be a fundamental shift in modelling scale that represents micro-organisms from the level of their macromolecular components. This will enable contact with omics data sets and allow acclimation and adaptive response at the phenotype level (i.e. traits) to be simulated as a combination of fitness maximization and evolutionary constraints. This way forward will build on ecological approaches that identify key organism traits and systems biology approaches that integrate traditional physiological measurements with new insights from omics. It will rely on developing an improved understanding of ecophysiology to understand quantitatively environmental controls on microbial growth strategies. It will also incorporate results from experimental evolution studies in the representation of adaptation. The resulting ecosystem-level models can then evaluate our level of understanding of controls on ecosystem structure and function, highlight major gaps in understanding and help prioritize areas for future research programs. Ultimately, this grand synthesis should improve predictive capability of the ecosystem response to multiple environmental drivers
    corecore