84 research outputs found

    Graph mining for next generation sequencing: leveraging the assembly graph for biological insights

    Get PDF
    Total genome length estimation for NG50. This spreadsheet file contains the calculation of the average genome lengths for the complete reference sequences available through the NCBI RefSeq database for the most abundant genera in the Crohnñ€™s and healthy data sets. The estimated total genome length is also calculated in this file. (XLSX 21 kb

    metaGEM: reconstruction of genome scale metabolic models directly from metagenomes

    Get PDF
    Metagenomic analyses of microbial communities have revealed a large degree of interspecies and intraspecies genetic diversity through the reconstruction of metagenome assembled genomes (MAGs). Yet, metabolic modeling efforts mainly rely on reference genomes as the starting point for reconstruction and simulation of genome scale metabolic models (GEMs), neglecting the immense intra- and inter-species diversity present in microbial communities. Here, we present metaGEM (https://github.com/franciscozo rrilla/metaGEM), an end-to-end pipeline enabling metabolic modeling of multi-species communities directly from metagenomes. The pipeline automates all steps from the extraction of context-specific prokaryotic GEMs from MAGs to community level flux balance analysis (FBA) simulations. To demonstrate the capabilities of metaGEM, we analyzed 483 samples spanning lab culture, human gut, plant-associated, soil, and ocean metagenomes, reconstructing over 14,000 GEMs. We show that GEMs reconstructed from metagenomes have fully represented metabolism comparable to isolated genomes. We demonstrate that metagenomic GEMs capture intraspecies metabolic diversity and identify potential differences in the progression of type 2 diabetes at the level of gut bacterial metabolic exchanges. Overall, metaGEM enables FBA-ready metabolic model reconstruction directly from metagenomes, provides a resource of metabolic models, and showcases community-level modeling of microbiomes associated with disease conditions allowing generation of mechanistic hypotheses

    Measuring the invisible – The sequences causal of genome size differences in eyebrights (Euphrasia) revealed by k-mers

    Get PDF
    Genome size variation within plant taxa is due to presence/absence variation, which may affect low-copy sequences or genomic repeats of various frequency classes. However, identifying the sequences underpinning genome size variation is challenging because genome assemblies commonly contain collapsed representations of repetitive sequences and because genome skimming studies by design miss low-copy number sequences. Here, we take a novel approach based on k-mers, short sub-sequences of equal length k, generated from whole-genome sequencing data of diploid eyebrights (Euphrasia), a group of plants that have considerable genome size variation within a ploidy level. We compare k-mer inventories within and between closely related species, and quantify the contribution of different copy number classes to genome size differences. We further match high-copy number k-mers to specific repeat types as retrieved from the RepeatExplorer2 pipeline. We find genome size differences of up to 230Mbp, equivalent to more than 20% genome size variation. The largest contributions to these differences come from rDNA sequences, a 145-nt genomic satellite and a repeat associated with an Angela transposable element. We also find size differences in the low-copy number class (copy number ≀ 10×) of up to 27 Mbp, possibly indicating differences in gene space between our samples. We demonstrate that it is possible to pinpoint the sequences causing genome size variation within species without the use of a reference genome. Such sequences can serve as targets for future cytogenetic studies. We also show that studies of genome size variation should go beyond repeats if they aim to characterise the full range of genomic variants. To allow future work with other taxonomic groups, we share our k-mer analysis pipeline, which is straightforward to run, relying largely on standard GNU command line tools

    Metagenomics reveals global-scale contrasts in nitrogen cycling and cyanobacterial light-harvesting mechanisms in glacier cryoconite

    Get PDF
    BACKGROUND: Cryoconite granules are mineral–microbial aggregates found on glacier surfaces worldwide and are hotspots of biogeochemical reactions in glacier ecosystems. However, despite their importance within glacier ecosystems, the geographical diversity of taxonomic assemblages and metabolic potential of cryoconite communities around the globe remain unclear. In particular, the genomic content of cryoconite communities on Asia’s high mountain glaciers, which represent a substantial portion of Earth’s ice masses, has rarely been reported. Therefore, in this study, to elucidate the taxonomic and ecological diversities of cryoconite bacterial consortia on a global scale, we conducted shotgun metagenomic sequencing of cryoconite acquired from a range of geographical areas comprising Polar (Arctic and Antarctic) and Asian alpine regions. RESULTS: Our metagenomic data indicate that compositions of both bacterial taxa and functional genes are particularly distinctive for Asian cryoconite. Read abundance of the genes responsible for denitrification was significantly more abundant in Asian cryoconite than the Polar cryoconite, implying that denitrification is more enhanced in Asian glaciers. The taxonomic composition of Cyanobacteria, the key primary producers in cryoconite communities, also differs between the Polar and Asian samples. Analyses on the metagenome-assembled genomes and fluorescence emission spectra reveal that Asian cryoconite is dominated by multiple cyanobacterial lineages possessing phycoerythrin, a green light-harvesting component for photosynthesis. In contrast, Polar cryoconite is dominated by a single cyanobacterial species Phormidesmis priestleyi that does not possess phycoerythrin. These findings suggest that the assemblage of cryoconite bacterial communities respond to regional- or glacier-specific physicochemical conditions, such as the availability of nutrients (e.g., nitrate and dissolved organic carbon) and light (i.e., incident shortwave radiation). CONCLUSIONS: Our genome-resolved metagenomics provides the first characterization of the taxonomic and metabolic diversities of cryoconite from contrasting geographical areas, highlighted by the distinct light-harvesting approaches of Cyanobacteria and nitrogen utilization between Polar and Asian cryoconite, and implies the existence of environmental controls on the assemblage of cryoconite communities. These findings deepen our understanding of the biodiversity and biogeochemical cycles of glacier ecosystems, which are susceptible to ongoing climate change and glacier decline, on a global scale. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-022-01238-7

    Novel workflow for metagenomics and transcriptomics analysis of A.D. systems

    Get PDF
    The A.D. systems (anaerobic digestion), when used in biogas reactors, are an advanced ecological way to produce energy while treating waste. The majority of the microbial community of the reactor remains unknown to this day, due to the impossibility to culture most of the bacteria individually. Metagenomics and transcriptomics aim to discover those bacteria and understand the interactions within the community. HTS (high throughput sequencing) technology opens new possibilities in terms of length of the reads sequenced and accuracy. Sequencing done by Oxford Nanopore machines can produce long reads while having a slightly worse accuracy than other machines, where Illumina sequencing machines have a higher accuracy to the detriment of lengths. The two sequencing methods complement each other, and the hybrid assembly uses both long and short reads to create longer and more accurate contigs that can then be further analysed. Here is presented a metagenomics pipeline (MUFFIN) based on the hybrid assembly of short and long reads followed by multiple differential binning methods and refinement to produce high-quality bins and their annotations. The pipeline is written by using Nextflow to achieve high reproducibility and fast and straightforward use of the pipeline. This pipeline also produces the taxonomic classification of the bins as well as a transcription, quantification and annotation of RNAseq data. The pipeline was tested using one biogas reactor as an example to assess the capacity of MUFFIN to process and output relevant files needed to analyse the microbial community and their function. A parsing script was developed to analyse and summarise the annotations files. The script outputs a quantification file of the transcripts annotated, an HTML file summarising the pathways across the bins and transcripts, and an HTML file for each bin summarising the annotation

    The sterlet sturgeon genome sequence and the mechanisms of segmental rediploidization.

    Get PDF
    Sturgeons seem to be frozen in time. The archaic characteristics of this ancient fish lineage place it in a key phylogenetic position at the base of the ~30,000 modern teleost fish species. Moreover, sturgeons are notoriously polyploid, providing unique opportunities to investigate the evolution of polyploid genomes. We assembled a high-quality chromosome-level reference genome for the sterlet, Acipenser ruthenus. Our analysis revealed a very low protein evolution rate that is at least as slow as in other deep branches of the vertebrate tree, such as that of the coelacanth. We uncovered a whole-genome duplication that occurred in the Jurassic, early in the evolution of the entire sturgeon lineage. Following this polyploidization, the rediploidization of the genome included the loss of whole chromosomes in a segmental deduplication process. While known adaptive processes helped conserve a high degree of structural and functional tetraploidy over more than 180 million years, the reduction of redundancy of the polyploid genome seems to have been remarkably random

    A comparison of proteomic, genomic, and osteological methods of archaeological sex estimation

    Get PDF
    Sex estimation of skeletons is fundamental to many archaeological studies. Currently, three approaches are available to estimate sex–osteology, genomics, or proteomics, but little is known about the relative reliability of these methods in applied settings. We present matching osteological, shotgun-genomic, and proteomic data to estimate the sex of 55 individuals, each with an independent radiocarbon date between 2,440 and 100 cal BP, from two ancestral Ohlone sites in Central California. Sex estimation was possible in 100% of this burial sample using proteomics, in 91% using genomics, and in 51% using osteology. Agreement between the methods was high, however conflicts did occur. Genomic sex estimates were 100% consistent with proteomic and osteological estimates when DNA reads were above 100,000 total sequences. However, more than half the samples had DNA read numbers below this threshold, producing high rates of conflict with osteological and proteomic data where nine out of twenty conditional DNA sex estimates conflicted with proteomics. While the DNA signal decreased by an order of magnitude in the older burial samples, there was no decrease in proteomic signal. We conclude that proteomics provides an important complement to osteological and shotgun-genomic sex estimation
    • 

    corecore