2,496 research outputs found
Characteristics of oligonucleotide frequencies across genomes: Conservation versus variation, strand symmetry, and evolutionary implications
One of the objectives of evolutionary genomics is to reveal the genetic information contained in the primordial genome (called the primary genetic information in this paper, with the primordial genome defined here as the most primitive nucleic acid genome for earth’s life) by searching for primitive traits or relics remained in modern genomes. As the shorter a sequence is, the less probable it would be modified during genome evolution. For that reason, some characteristics of very short nucleotide sequences would have considerable chances to persist during billions of years of evolution. Consequently, conservation of certain genomic features of mononucleotides, dinucleotides, and higher-order oligonucleotides across various genomes may exist; some, if not all, of these features would be relics of the primary genetic information. Based on this assumption, we analyzed the pattern of frequencies of mononucleotides, dinucleotides, and higher-order oligonucleotides of the whole-genome sequences from 458 species (including archaea, bacteria, and eukaryotes). Also, we studied the phenomenon of strand symmetry in these genomes. The results show that the conservation of frequencies of some dinucleotides and higher-order oligonucleotides across genomes does exist, and that strand symmetry is a ubiquitous and explicit phenomenon that may contribute to frequency conservation. We propose a new hypothesis for the origin of strand symmetry and frequency conservation as well as for the constitution of early genomes. We conclude that the phenomena of strand symmetry and the pattern of frequency conservation would be original features of the primary genetic information
A Primer on Metagenomics
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics
Recommended from our members
Comparative genomics and phylogenomic investigation of the class Geoglossomycetes provide insights into ecological specialization and the systematics of Pezizomycotina
Despite their global presence and ubiquity, members of the class Geoglossomycetes (Pezizomycotina, Ascomycota) are understudied systematically and ecologically. These fungi have long been presumed saprobic due to their occurrence in or near leaf litter and soils. Additionally, they lack an apparent association with other organisms, reinforcing this perception. However, observations of sporocarps near ericaceous shrubs have given rise to an alternative hypothesis that members of Geoglossomycetes may form ericoid mycorrhizae or ectomycorrhizae. This claim, however, has yet to be confirmed via microscopy or amplicon-based studies examining root communities. As a result, our current understanding of their ecology is based on cursory observations. This study presents a comparative analysis of genomic signatures related to ecological niche to investigate the hypothesis of an ericoid mycorrhizal or ectomycorrhizal ecology in the class. We compared the carbohydrate-active enzyme (CAZyme) and secondary metabolite contents of six newly sequenced Geoglossomycetes genomes with those of fungi representing specific ecologies across Pezizomycotina. Our analysis reveals CAZyme and secondary metabolite content patterns consistent with ectomycorrhizal (EcM) members of Pezizomycotina. Specifically, we found a reduction in CAZyme-encoding genes and secondary metabolite clusters that suggests a mutualistic ecology. Our work includes the broadest taxon sampling for a phylogenomic study of Pezizomycotina to date. It represents the first functional genomic and genome-scale phylogenetic study of the class Geoglossomycetes and improves the foundational knowledge of the ecology and evolution of these understudied fungi
Genotype–phenotype correlations within the Geodermatophilaceae
The integration of genomic information into microbial systematics along with physiological and chemotaxonomic parameters provides for a reliable classification of prokaryotes. In silico analysis of chemotaxonomic traits is now being introduced to replace characteristics traditionally determined in the laboratory with the dual goal of both increasing the speed of the description of taxa and the accuracy and consistency of taxonomic reports. Genomics has already successfully been applied in the taxonomic rearrangement of Geodermatophilaceae (Actinomycetota) but in the light of new genomic data the taxonomy of the family needs to be revisited. In conjunction with the taxonomic characterisation of four strains phylogenetically located within the family, we conducted a phylogenetic analysis of the whole proteomes of the sequenced type strains and established genotype–phenotype correlations for traits related to chemotaxonomy, cell morphology and metabolism. Results indicated that the four isolates under study represent four novel species within the genus Blastococcus. Additionally, the genera Blastococcus, Geodermatophilus and Modestobacter were shown to be paraphyletic. Consequently, the new genera Trujillonella, Pleomorpha and Goekera were proposed within the Geodermatophilaceae and Blastococcus endophyticus was reclassified as Trujillonella endophytica comb. nov., Geodermatophilus daqingensis as Pleomorpha daqingensis comb. nov. and Modestobacter deserti as Goekera deserti comb. nov. Accordingly, we also proposed emended descriptions of Blastococcus aggregatus, Blastococcus jejuensis, Blastococcus saxobsidens and Blastococcus xanthilyniticus. In silico chemotaxonomic results were overall consistent with wet-lab results. Even though in silico discriminatory levels varied depending on the respective chemotaxonomic trait, this approach is promising for effectively replacing and/or complementing chemotaxonomic analyses at taxonomic ranks above the species level. Finally, interesting but previously overlooked insights regarding morphology and ecology were revealed by the presence of a repertoire of genes related to flagellum synthesis, chemotaxis, spore production and pilus assembly in all representatives of the family. A rich carbon metabolism including four different CO2 fixation pathways and a battery of enzymes able to degrade complex carbohydrates were also identified in Blastococcus genomes
Transcriptome Complexities Across Eukaryotes
Genomic complexity is a growing field of evolution, with case studies for
comparative evolutionary analyses in model and emerging non-model systems.
Understanding complexity and the functional components of the genome is an
untapped wealth of knowledge ripe for exploration. With the "remarkable lack of
correspondence" between genome size and complexity, there needs to be a way to
quantify complexity across organisms. In this study we use a set of complexity
metrics that allow for evaluation of changes in complexity using TranD. We
ascertain if complexity is increasing or decreasing across transcriptomes and
at what structural level, as complexity is varied. We define three metrics --
TpG, EpT, and EpG in this study to quantify the complexity of the transcriptome
that encapsulate the dynamics of alternative splicing. Here we compare
complexity metrics across 1) whole genome annotations, 2) a filtered subset of
orthologs, and 3) novel genes to elucidate the impacts of ortholog and novel
genes in transcriptome analysis. We also derive a metric from Hong et al.,
2006, Effective Exon Number (EEN), to compare the distribution of exon sizes
within transcripts against random expectations of uniform exon placement. EEN
accounts for differences in exon size, which is important because novel genes
differences in complexity for orthologs and whole transcriptome analyses are
biased towards low complexity genes with few exons and few alternative
transcripts. With our metric analyses, we are able to implement changes in
complexity across diverse lineages with greater precision and accuracy than
previous cross-species comparisons under ortholog conditioning. These analyses
represent a step forward toward whole transcriptome analysis in the emerging
field of non-model evolutionary genomics, with key insights for evolutionary
inference of complexity changes on deep timescales across the tree of life. We
suggest a means to quantify biases generated in ortholog calling and correct
complexity analysis for lineage-specific effects. With these metrics, we
directly assay the quantitative properties of newly formed lineage-specific
genes as they lower complexity in transcriptomes.Comment: 33 pages main text; 6 main figures; 25 pages of supplement; 1
supplementary table; 24 Supp Figures; 58 pages tota
LTR Retrotransposons Contribute to Genomic Gigantism in Plethodontid Salamanders
Among vertebrates, most of the largest genomes are found within the salamanders, a clade of amphibians that includes 613 species. Salamander genome sizes range from ∼14 to ∼120 Gb. Because genome size is correlated with nucleus and cell sizes, as well as other traits, morphological evolution in salamanders has been profoundly affected by genomic gigantism. However, the molecular mechanisms driving genomic expansion in this clade remain largely unknown. Here, we present the first comparative analysis of transposable element (TE) content in salamanders. Using high-throughput sequencing, we generated genomic shotgun data for six species from the Plethodontidae, the largest family of salamanders. We then developed a pipeline to mine TE sequences from shotgun data in taxa with limited genomic resources, such as salamanders. Our summaries of overall TE abundance and diversity for each species demonstrate that TEs make up a substantial portion of salamander genomes, and that all of the major known types of TEs are represented in salamanders. The most abundant TE superfamilies found in the genomes of our six focal species are similar, despite substantial variation in genome size. However, our results demonstrate a major difference between salamanders and other vertebrates: salamander genomes contain much larger amounts of long terminal repeat (LTR) retrotransposons, primarily Ty3/gypsy elements. Thus, the extreme increase in genome size that occurred in salamanders was likely accompanied by a shift in TE landscape. These results suggest that increased proliferation of LTR retrotransposons was a major molecular mechanism contributing to genomic expansion in salamanders
Bridging the gap between omics and earth system science to better understand how environmental change impacts marine microbes
The advent of genomic-, transcriptomic- and proteomic-based approaches has revolutionized our ability to describe marine microbial communities, including biogeography, metabolic potential and diversity, mechanisms of adaptation, and phylogeny and evolutionary history. New interdisciplinary approaches are needed to move from this descriptive level to improved quantitative, process-level understanding of the roles of marine microbes in biogeochemical cycles and of the impact of environmental change on the marine microbial ecosystem. Linking studies at levels from the genome to the organism, to ecological strategies and organism and ecosystem response, requires new modelling approaches. Key to this will be a fundamental shift in modelling scale that represents micro-organisms from the level of their macromolecular components. This will enable contact with omics data sets and allow acclimation and adaptive response at the phenotype level (i.e. traits) to be simulated as a combination of fitness maximization and evolutionary constraints. This way forward will build on ecological approaches that identify key organism traits and systems biology approaches that integrate traditional physiological measurements with new insights from omics. It will rely on developing an improved understanding of ecophysiology to understand quantitatively environmental controls on microbial growth strategies. It will also incorporate results from experimental evolution studies in the representation of adaptation. The resulting ecosystem-level models can then evaluate our level of understanding of controls on ecosystem structure and function, highlight major gaps in understanding and help prioritize areas for future research programs. Ultimately, this grand synthesis should improve predictive capability of the ecosystem response to multiple environmental drivers
- …