30 research outputs found

    Large-scale 16S gene assembly using metagenomics shotgun sequences.

    Get PDF
    MotivationCombining a 16S rRNA (16S) gene database with metagenomic shotgun sequences promises unbiased identification of known and novel microbes.ResultsTo achieve this, we herein report reference-based ribosome assembly (RAMBL), a computational pipeline, which integrates taxonomic tree search and Dirichlet process clustering to reconstruct full-length 16S gene sequences from metagenomic sequencing data with high accuracy. By benchmarking against the synthetic and real shotgun sequences, we demonstrated that full-length 16S gene assemblies of RAMBL were a good proxy for known and putative microbes, including Candidate Phyla Radiation. We found that 30-40% of bacteria genera in the terrestrial and intestinal biomes have no closely related genome sequences. We also observed that RAMBL was able to generate a more accurate determination of environmental microbial diversity and yield better disease classification, suggesting that full-length 16S gene assemblies are a powerful alternative to marker gene set and 16S short reads. RAMBL first realizes the access to full-length 16S gene sequences in the near-terabase-scale metagenomic shotgun sequences, which markedly improve metagenomic data analysis and interpretation.Availability and implementationRAMBL is available at https://github.com/homopolymer/RAMBL for academic [email protected] informationSupplementary data are available at Bioinformatics online

    Novel canine high-quality metagenome-assembled genomes, prophages and host-associated plasmids provided by long-read metagenomics together with Hi-C proximity ligation

    Get PDF
    The human gut microbiome has been extensively studied, yet the canine gut microbiome is still largely unknown. The availability of high-quality genomes is essential in the fields of veterinary medicine and nutrition to unravel the biological role of key microbial members in the canine gut environment. Our aim was to evaluate nanopore long-read metagenomics and Hi-C (high-throughput chromosome conformation capture) proximity ligation to provide high-quality metagenome-assembled genomes (HQ MAGs) of the canine gut environment. By combining nanopore long-read metagenomics and Hi-C proximity ligation, we retrieved 27 HQ MAGs and 7 medium-quality MAGs of a faecal sample of a healthy dog. Canine MAGs (CanMAGs) improved genome contiguity of representatives from the animal and human MAG catalogues - short-read MAGs from public datasets - for the species they represented: they were more contiguous with complete ribosomal operons and at least 18 canonical tRNAs. Both canine-specific bacterial species and gut generalists inhabit the dog's gastrointestinal environment. Most of them belonged to , followed by and . We also assembled one and one MAG. CanMAGs harboured antimicrobial-resistance genes (ARGs) and prophages and were linked to plasmids. ARGs conferring resistance to tetracycline were most predominant within CanMAGs, followed by lincosamide and macrolide ones. At the functional level, carbohydrate transport and metabolism was the most variable within the CanMAGs, and mobilome function was abundant in some MAGs. Specifically, we assigned the mobilome functions and the associated mobile genetic elements to the bacterial host. The CanMAGs harboured 50 bacteriophages, providing novel bacterial-host information for eight viral clusters, and Hi-C proximity ligation data linked the six potential plasmids to their bacterial host. Long-read metagenomics and Hi-C proximity ligation are likely to become a comprehensive approach to HQ MAG discovery and assignment of extra-chromosomal elements to their bacterial host. This will provide essential information for studying the canine gut microbiome in veterinary medicine and animal nutrition

    Reconstruction of full-length 16S rRNA sequences for taxonomic assignment inmetagenomics

    Get PDF
    National audienceAdvances in the sequencing of uncultured environmental samples, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length marker genes and this poses difficult bioinformatic challenges for taxonomy identification at high resolution. We designed MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph. It is applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes. We show that MATAM outperforms other available methods in terms of low error rates and recovered genome fractions and is suitable to provide improved assemblies for precise taxonomic assignments

    Illuminating the dynamic rare biosphere of the Greenland Ice Sheet's Dark Zone

    Get PDF
    Greenland's Dark Zone is the largest contiguous region of bare terrestrial ice in the Northern Hemisphere and microbial processes play an important role in driving its darkening and thereby amplifying melt and runoff from the ice sheet. However, the dynamics of these microbiota have not been fully identified. Here we present joint 16S rRNA gene and 16S rRNA (cDNA) comparison of input (snow), storage (cryoconite), and output (supraglacial stream water) habitats across the Dark Zone over the melt season. We reveal that all three Dark Zone communities have a preponderance of rare taxa exhibiting high protein synthesis potential (PSP). Furthermore, taxa with high PSP represent highly connected β€˜bottlenecks’ within community structure, consistent with their roles as metabolic hubs. Finally, low abundance-high PSP taxa affiliated with Methylobacterium within snow and stream water suggest a novel role for Methylobacterium in the carbon cycle of Greenlandic snowpacks, and importantly, the export of potentially active methylotrophs to the bed of the Greenland Ice Sheet. By comparing the dynamics of bulk and potentially active microbiota in the Dark Zone of the Greenland Ice Sheet we provide novel insights into the mechanisms and impacts of the microbial colonization of this critical region of our melting planet

    Towards complete representation of bacterial contents in metagenomic samples

    Full text link
    Background: In the metagenome assembly of a microbiome community, we may think abundant species would be easier to assemble due to their deeper coverage. However, this conjucture is rarely tested. We often do not know how many abundant species we are missing and do not have an approach to recover these species. Results: Here we proposed k-mer based and 16S RNA based methods to measure the completeness of metagenome assembly. We showed that even with PacBio High-Fidelity (HiFi) reads, abundant species are often not assembled as high strain diversity may lead to fragmented contigs. We developed a novel algorithm to recover abundant metagenome-assembled genomes (MAGs) by identifying circular assembly subgraphs. Our algorithm is reference-free and complement to standard metagenome binning. Evaluated on 14 real datasets, it rescued many abundant species that would be missing with existing methods. Conclusions: Our work stresses the importance of metagenome completeness which is often overlooked before. Our algorithm generates more circular MAGs and moves a step closer to the complete representation of microbiome communities

    ΠŸΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½Ρ‹Ρ… контСкстно-свободных Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊ Π² Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ΅ синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL

    Get PDF
    Π“ΠΎΡ€ΠΎΡ…ΠΎΠ² АртСм Π’Π»Π°Π΄ΠΈΠΌΠΈΡ€ΠΎΠ²ΠΈΡ‡ ΠŸΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½Ρ‹Ρ… контСкстно-свободных Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊ Π² Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ΅ синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL ΠΊΠ°Π½Π΄ΠΈΠ΄Π°Ρ‚ Ρ„ΠΈΠ·ΠΈΠΊΠΎ-матСматичСских Π½Π°ΡƒΠΊ Π‘Π΅ΠΌΠ΅Π½ ВячСславович Π“Ρ€ΠΈΠ³ΠΎΡ€ΡŒΠ΅Π² НаправлСниС ΠΌΠ°Ρ‚Π΅ΠΌΠ°Ρ‚ΠΈΠΊΠ° ΠΈ ΠΌΠ΅Ρ…Π°Π½ΠΈΠΊΠ°, ΠΊΠ°Ρ„Π΅Π΄Ρ€Π° систСмного программирования БинтаксичСский Π°Π½Π°Π»ΠΈΠ· ΠΈΠ³Ρ€Π°Π΅Ρ‚ Π²Π°ΠΆΠ½ΡƒΡŽ Ρ€ΠΎΠ»ΡŒ Π² статичСском Π°Π½Π°Π»ΠΈΠ·Π΅ ΠΏΡ€ΠΎΠ³Ρ€Π°ΠΌΠΌ: Π½Π° этом этапС Π°Π½Π°Π»ΠΈΠ·Π° создаётся структурноС прСдставлСниС ΠΊΠΎΠ΄Π°, Π½Π°Π΄ ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΌ производится дальнСйший Π°Π½Π°Π»ΠΈΠ·. Π˜Π½ΡΡ‚Ρ€ΡƒΠΌΠ΅Π½Ρ‚Ρ‹ для Π³Π΅Π½Π΅Ρ€Π°Ρ†ΠΈΠΈ синтаксичСских Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ΠΎΠ² ΠΏΠΎ спСцификации языка Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚Π·ΠΈΡ€ΡƒΡŽΡ‚ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΡƒ Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ΠΎΠ². ΠžΠ±Ρ‹Ρ‡Π½ΠΎ спСцификациСй слуТит нСоднозначная Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊΠ° Π² Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½ΠΎΠΉ Ρ„ΠΎΡ€ΠΌΠ΅ Бэкуса-Наура (EBNF), Π½ΠΎ Π±ΠΎΠ»ΡŒΡˆΠΈΠ½ΡΡ‚Π²ΠΎ инструмСнтов Π½Π΅ ΠΌΠΎΠ³ΡƒΡ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π΄Π°Π½Π½ΡƒΡŽ Ρ„ΠΎΡ€ΠΌΡƒ Π±Π΅Π· прСобразования. АвтоматичСскоС ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊ ΠΎΠ±Ρ‹Ρ‡Π½ΠΎ ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ ΠΊ сниТСнию ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ Π°Π½Π°Π»ΠΈΠ·Π°. Π‘ΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‚ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄Ρ‹ ΠΊ синтаксичСскому Π°Π½Π°Π»ΠΈΠ·Ρƒ EBNF-Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊ, Π½ΠΎ ΠΎΠ½ΠΈ Π½Π΅ Π΄ΠΎΠΏΡƒΡΠΊΠ°ΡŽΡ‚ нСоднозначностСй Π² Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΠΊΠ°Ρ…. Π‘ Π΄Ρ€ΡƒΠ³ΠΎΠΉ стороны, Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ Generalised LL позволяСт ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Π½Π΅ΠΎΠ΄Π½ΠΎΠ·Π½Π°Ρ‡Π½Ρ‹Π΅ BNF-Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊΠΈ ΠΈ ΠΏΠΎΠΊΠ°Π·Ρ‹Π²Π°Π΅Ρ‚ Ρ…ΠΎΡ€ΠΎΡˆΡƒΡŽ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ, Π½ΠΎ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ‚ Ρ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ с EBNF-Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊΠ°ΠΌΠΈ. Π’ этой Ρ€Π°Π±ΠΎΡ‚Π΅ прСдлагаСтся модификация Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° GLL, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‰Π°Ρ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚ Π³Ρ€Π°ΠΌΠ°Ρ‚ΠΈΠΊ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ тСсно связан с EBNF: Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½Ρ‹Π΅ контСкстно-свободныС Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΊΠΈ. ΠšΡ€ΠΎΠΌΠ΅ Ρ‚ΠΎΠ³ΠΎ, Π±Ρ‹Π»ΠΎ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, Ρ‡Ρ‚ΠΎ модификация ΡƒΠ²Π΅Π»ΠΈΡ‡ΠΈΠ²Π°Π΅Ρ‚ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡŒ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ° ΠΏΠΎ ΡΡ€Π°Π²Π½Π΅Π½ΠΈΡŽ с основанным Π½Π° ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Π°Π½ΠΈΠΈ EBNF. Π˜ΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Π½Π½Ρ‹Ρ… источников: 32 Π“ΠΎΡ€ΠΎΡ…ΠΎΠ², А. Π’. ΠŸΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΠ° Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½Ρ‹Ρ… контСкстно-свободных Π³Ρ€Π°ΠΌΠΌΠ°Ρ‚ΠΈΠΊ Π² Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠ΅ синтаксичСского Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL: выпускная квалификационная Ρ€Π°Π±ΠΎΡ‚Π°: Π·Π°Ρ‰ΠΈΡ‰Π΅Π½Π° 09.06.2017 / Π“ΠΎΡ€ΠΎΡ…ΠΎΠ² АртСм Π’Π»Π°Π΄ΠΈΠΌΠΈΡ€ΠΎΠ²ΠΈΡ‡. – БПб., 2017. – 37 с. – Библиография: с. 31–34.Gorokhov Artem Vladimirovich Support of extended context-free grammars in Generalised LL parsing algorithm Associate professor Semyon Grigorev. Mathematics & mechanics, software engineering department Parsing plays an important role in static program analysis: during this step a structural representation of code is created upon which further analysis is performed. Parser generator tools, being provided with syntax specification, automate parser development. Language documentation often acts as such specification. Documentation usually takes form of ambiguous grammar in Extended Backus-Naur Form which most parser generators fail to process. Automatic grammar transformation generally leads to parsing performance decrease. Some approaches support EBNF grammars natively, but they all fail to handle ambiguous grammars. On the other hand, Generalised LL parsing algorithm admits arbitrary context-free grammars and achieves good performance, but cannot handle EBNF grammars. The main contribution of this paper is a modification of GLL algorithm which can process grammars in a form which is closely related to EBNF (Extended Context-Free Grammar). We also show that the modification improves parsing performance as compared to grammar transformation-based approach. Sources cited: 32 Gorokhov, A. V. Support of extended context-free grammars in Generalised LL parsing algorithm: Graduation thesis: Defended 09.06.2017 / Gorokhov Artem Vladimirovich. – St. Petersburg., 2017. – 37 pp. – Bibliography: pp. 21-34

    Unlinked rRNA genes are widespread among bacteria and archaea

    Get PDF
    International audienceRibosomes are essential to cellular life and the genes for their RNA components arethe most conserved and transcribed genes in Bacteria and Archaea. Ribosomal rRNA genes are typically organized into a single operon, an arrangement thought to facilitate gene regulation. In reality, some Bacteria and Archaea do not share this canonical rRNA arrangement - their 16S and 23S rRNA genes are separated across the genome and referred to as "unlinked". This rearrangement has previously been treated as an anomaly or a byproduct of genome degradation in intracellular bacteria. Here, we leverage complete genome and long-read metagenomic data to show that unlinked 16S and 23S rRNA genes are more common than previously thought. Unlinked rRNA genes occur in many phyla, most significantly within Deinococcus-Thermus, Chloroflexi, and Planctomycetes, and occur in differential frequencies across natural environments. We found that up to 41% of rRNA genes in soil were unlinked, in contrast to the human gut, where all sequenced rRNA genes were linked. The frequency of unlinked rRNA genes may reflect meaningful life history traits, as they tend to be associated with a mix of slow-growing free-living species and intracellular species. We speculate that unlinked rRNA genes may confer selective advantages in some environments, though the specific nature of these advantages remains undetermined and worthy of further investigation. More generally, the prevalence of unlinked rRNA genes in poorly-studied taxa serves as a reminder that paradigms derived from model organisms do not necessarily extend to the broader diversity of Bacteria and Archaea

    Genomic evidence for sulfur intermediates as new biogeochemical hubs in a model aquatic microbial ecosystem

    Get PDF
    Background: The sulfur cycle encompasses a series of complex aerobic and anaerobic transformations of S-containing molecules and plays a fundamental role in cellular and ecosystem-level processes, influencing biological carbon transfers and other biogeochemical cycles. Despite their importance, the microbial communities and metabolic pathways involved in these transformations remain poorly understood, especially for inorganic sulfur compounds of intermediate oxidation states (thiosulfate, tetrathionate, sulfite, polysulfides). Isolated and highly stratified, the extreme geochemical and environmental features of meromictic ice-capped Lake A, in the Canadian High Arctic, provided an ideal model ecosystem to resolve the distribution and metabolism of aquatic sulfur cycling microorganisms along redox and salinity gradients. Results: Applying complementary molecular approaches, we identified sharply contrasting microbial communities and metabolic potentials among the markedly distinct water layers of Lake A, with similarities to diverse fresh, brackish and saline water microbiomes. Sulfur cycling genes were abundant at all depths and covaried with bacterial abundance. Genes for oxidative processes occurred in samples from the oxic freshwater layers, reductive reactions in the anoxic and sulfidic bottom waters and genes for both transformations at the chemocline. Up to 154 different genomic bins with potential for sulfur transformation were recovered, revealing a panoply of taxonomically diverse microorganisms with complex metabolic pathways for biogeochemical sulfur reactions. Genes for the utilization of sulfur cycle intermediates were widespread throughout the water column, co-occurring with sulfate reduction or sulfide oxidation pathways. The genomic bin composition suggested that in addition to chemical oxidation, these intermediate sulfur compounds were likely produced by the predominant sulfur chemo- and photo-oxidisers at the chemocline and by diverse microbial degraders of organic sulfur molecules. Conclusions: The Lake A microbial ecosystem provided an ideal opportunity to identify new features of the biogeochemical sulfur cycle. Our detailed metagenomic analyses across the broad physico-chemical gradients of this permanently stratified lake extend the known diversity of microorganisms involved in sulfur transformations over a wide range of environmental conditions. The results indicate that sulfur cycle intermediates and organic sulfur molecules are major sources of electron donors and acceptors for aquatic and sedimentary microbial communities in association with the classical sulfur cycl
    corecore