30 research outputs found
Large-scale 16S gene assembly using metagenomics shotgun sequences.
MotivationCombining a 16S rRNA (16S) gene database with metagenomic shotgun sequences promises unbiased identification of known and novel microbes.ResultsTo achieve this, we herein report reference-based ribosome assembly (RAMBL), a computational pipeline, which integrates taxonomic tree search and Dirichlet process clustering to reconstruct full-length 16S gene sequences from metagenomic sequencing data with high accuracy. By benchmarking against the synthetic and real shotgun sequences, we demonstrated that full-length 16S gene assemblies of RAMBL were a good proxy for known and putative microbes, including Candidate Phyla Radiation. We found that 30-40% of bacteria genera in the terrestrial and intestinal biomes have no closely related genome sequences. We also observed that RAMBL was able to generate a more accurate determination of environmental microbial diversity and yield better disease classification, suggesting that full-length 16S gene assemblies are a powerful alternative to marker gene set and 16S short reads. RAMBL first realizes the access to full-length 16S gene sequences in the near-terabase-scale metagenomic shotgun sequences, which markedly improve metagenomic data analysis and interpretation.Availability and implementationRAMBL is available at https://github.com/homopolymer/RAMBL for academic [email protected] informationSupplementary data are available at Bioinformatics online
Novel canine high-quality metagenome-assembled genomes, prophages and host-associated plasmids provided by long-read metagenomics together with Hi-C proximity ligation
The human gut microbiome has been extensively studied, yet the canine gut microbiome is still largely unknown. The availability of high-quality genomes is essential in the fields of veterinary medicine and nutrition to unravel the biological role of key microbial members in the canine gut environment. Our aim was to evaluate nanopore long-read metagenomics and Hi-C (high-throughput chromosome conformation capture) proximity ligation to provide high-quality metagenome-assembled genomes (HQ MAGs) of the canine gut environment. By combining nanopore long-read metagenomics and Hi-C proximity ligation, we retrieved 27 HQ MAGs and 7 medium-quality MAGs of a faecal sample of a healthy dog. Canine MAGs (CanMAGs) improved genome contiguity of representatives from the animal and human MAG catalogues - short-read MAGs from public datasets - for the species they represented: they were more contiguous with complete ribosomal operons and at least 18 canonical tRNAs. Both canine-specific bacterial species and gut generalists inhabit the dog's gastrointestinal environment. Most of them belonged to , followed by and . We also assembled one and one MAG. CanMAGs harboured antimicrobial-resistance genes (ARGs) and prophages and were linked to plasmids. ARGs conferring resistance to tetracycline were most predominant within CanMAGs, followed by lincosamide and macrolide ones. At the functional level, carbohydrate transport and metabolism was the most variable within the CanMAGs, and mobilome function was abundant in some MAGs. Specifically, we assigned the mobilome functions and the associated mobile genetic elements to the bacterial host. The CanMAGs harboured 50 bacteriophages, providing novel bacterial-host information for eight viral clusters, and Hi-C proximity ligation data linked the six potential plasmids to their bacterial host. Long-read metagenomics and Hi-C proximity ligation are likely to become a comprehensive approach to HQ MAG discovery and assignment of extra-chromosomal elements to their bacterial host. This will provide essential information for studying the canine gut microbiome in veterinary medicine and animal nutrition
Reconstruction of full-length 16S rRNA sequences for taxonomic assignment inmetagenomics
National audienceAdvances in the sequencing of uncultured environmental samples, raise a growing need for accurate taxonomic assignment. Accurate identification of organisms present within a community is essential to understanding even the most elementary ecosystems. However, current high-throughput sequencing technologies generate short reads which partially cover full-length marker genes and this poses difficult bioinformatic challenges for taxonomy identification at high resolution. We designed MATAM, a software dedicated to the fast and accurate targeted assembly of short reads sequenced from a genomic marker of interest. The method implements a stepwise process based on construction and analysis of a read overlap graph. It is applied to the assembly of 16S rRNA markers and is validated on simulated, synthetic and genuine metagenomes. We show that MATAM outperforms other available methods in terms of low error rates and recovered genome fractions and is suitable to provide improved assemblies for precise taxonomic assignments
Illuminating the dynamic rare biosphere of the Greenland Ice Sheet's Dark Zone
Greenland's Dark Zone is the largest contiguous region of bare terrestrial ice in the Northern Hemisphere and microbial processes play an important role in driving its darkening and thereby amplifying melt and runoff from the ice sheet. However, the dynamics of these microbiota have not been fully identified. Here we present joint 16S rRNA gene and 16S rRNA (cDNA) comparison of input (snow), storage (cryoconite), and output (supraglacial stream water) habitats across the Dark Zone over the melt season. We reveal that all three Dark Zone communities have a preponderance of rare taxa exhibiting high protein synthesis potential (PSP). Furthermore, taxa with high PSP represent highly connected βbottlenecksβ within community structure, consistent with their roles as metabolic hubs. Finally, low abundance-high PSP taxa affiliated with Methylobacterium within snow and stream water suggest a novel role for Methylobacterium in the carbon cycle of Greenlandic snowpacks, and importantly, the export of potentially active methylotrophs to the bed of the Greenland Ice Sheet. By comparing the dynamics of bulk and potentially active microbiota in the Dark Zone of the Greenland Ice Sheet we provide novel insights into the mechanisms and impacts of the microbial colonization of this critical region of our melting planet
Towards complete representation of bacterial contents in metagenomic samples
Background: In the metagenome assembly of a microbiome community, we may
think abundant species would be easier to assemble due to their deeper
coverage. However, this conjucture is rarely tested. We often do not know how
many abundant species we are missing and do not have an approach to recover
these species.
Results: Here we proposed k-mer based and 16S RNA based methods to measure
the completeness of metagenome assembly. We showed that even with PacBio
High-Fidelity (HiFi) reads, abundant species are often not assembled as high
strain diversity may lead to fragmented contigs. We developed a novel algorithm
to recover abundant metagenome-assembled genomes (MAGs) by identifying circular
assembly subgraphs. Our algorithm is reference-free and complement to standard
metagenome binning. Evaluated on 14 real datasets, it rescued many abundant
species that would be missing with existing methods.
Conclusions: Our work stresses the importance of metagenome completeness
which is often overlooked before. Our algorithm generates more circular MAGs
and moves a step closer to the complete representation of microbiome
communities
ΠΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ° ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΡΡ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ½ΠΎ-ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΡ Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊ Π² Π°Π»Π³ΠΎΡΠΈΡΠΌΠ΅ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL
ΠΠΎΡΠΎΡ
ΠΎΠ² ΠΡΡΠ΅ΠΌ ΠΠ»Π°Π΄ΠΈΠΌΠΈΡΠΎΠ²ΠΈΡ ΠΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ° ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΡΡ
ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ½ΠΎ-ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΡ
Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊ Π² Π°Π»Π³ΠΎΡΠΈΡΠΌΠ΅ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL ΠΊΠ°Π½Π΄ΠΈΠ΄Π°Ρ ΡΠΈΠ·ΠΈΠΊΠΎ-ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π½Π°ΡΠΊ Π‘Π΅ΠΌΠ΅Π½ ΠΡΡΠ΅ΡΠ»Π°Π²ΠΎΠ²ΠΈΡ ΠΡΠΈΠ³ΠΎΡΡΠ΅Π² ΠΠ°ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΠΊΠ° ΠΈ ΠΌΠ΅Ρ
Π°Π½ΠΈΠΊΠ°, ΠΊΠ°ΡΠ΅Π΄ΡΠ° ΡΠΈΡΡΠ΅ΠΌΠ½ΠΎΠ³ΠΎ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π‘ΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π°Π½Π°Π»ΠΈΠ· ΠΈΠ³ΡΠ°Π΅Ρ Π²Π°ΠΆΠ½ΡΡ ΡΠΎΠ»Ρ Π² ΡΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΌ Π°Π½Π°Π»ΠΈΠ·Π΅ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌ: Π½Π° ΡΡΠΎΠΌ ΡΡΠ°ΠΏΠ΅ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠΎΠ·Π΄Π°ΡΡΡΡ ΡΡΡΡΠΊΡΡΡΠ½ΠΎΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΠΊΠΎΠ΄Π°, Π½Π°Π΄ ΠΊΠΎΡΠΎΡΡΠΌ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΡΡ Π΄Π°Π»ΡΠ½Π΅ΠΉΡΠΈΠΉ Π°Π½Π°Π»ΠΈΠ·. ΠΠ½ΡΡΡΡΠΌΠ΅Π½ΡΡ Π΄Π»Ρ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡΠΎΠ² ΠΏΠΎ ΡΠΏΠ΅ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠΈ ΡΠ·ΡΠΊΠ° Π°Π²ΡΠΎΠΌΠ°ΡΠ·ΠΈΡΡΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΡ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡΠΎΠ². ΠΠ±ΡΡΠ½ΠΎ ΡΠΏΠ΅ΡΠΈΡΠΈΠΊΠ°ΡΠΈΠ΅ΠΉ ΡΠ»ΡΠΆΠΈΡ Π½Π΅ΠΎΠ΄Π½ΠΎΠ·Π½Π°ΡΠ½Π°Ρ Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊΠ° Π² ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΠΎΠΉ ΡΠΎΡΠΌΠ΅ ΠΡΠΊΡΡΠ°-ΠΠ°ΡΡΠ° (EBNF), Π½ΠΎ Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²ΠΎ ΠΈΠ½ΡΡΡΡΠΌΠ΅Π½ΡΠΎΠ² Π½Π΅ ΠΌΠΎΠ³ΡΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ Π΄Π°Π½Π½ΡΡ ΡΠΎΡΠΌΡ Π±Π΅Π· ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΡ. ΠΠ²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊ ΠΎΠ±ΡΡΠ½ΠΎ ΠΏΡΠΈΠ²ΠΎΠ΄ΠΈΡ ΠΊ ΡΠ½ΠΈΠΆΠ΅Π½ΠΈΡ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ Π°Π½Π°Π»ΠΈΠ·Π°. Π‘ΡΡΠ΅ΡΡΠ²ΡΡΡ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Ρ ΠΊ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠΌΡ Π°Π½Π°Π»ΠΈΠ·Ρ EBNF-Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊ, Π½ΠΎ ΠΎΠ½ΠΈ Π½Π΅ Π΄ΠΎΠΏΡΡΠΊΠ°ΡΡ Π½Π΅ΠΎΠ΄Π½ΠΎΠ·Π½Π°ΡΠ½ΠΎΡΡΠ΅ΠΉ Π² Π³ΡΠ°ΠΌΠ°ΡΠΈΠΊΠ°Ρ
. Π‘ Π΄ΡΡΠ³ΠΎΠΉ ΡΡΠΎΡΠΎΠ½Ρ, Π°Π»Π³ΠΎΡΠΈΡΠΌ Generalised LL ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ Π½Π΅ΠΎΠ΄Π½ΠΎΠ·Π½Π°ΡΠ½ΡΠ΅ BNF-Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊΠΈ ΠΈ ΠΏΠΎΠΊΠ°Π·ΡΠ²Π°Π΅Ρ Ρ
ΠΎΡΠΎΡΡΡ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΡ, Π½ΠΎ Π½Π΅ ΠΌΠΎΠΆΠ΅Ρ ΡΠ°Π±ΠΎΡΠ°ΡΡ Ρ EBNF-Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊΠ°ΠΌΠΈ. Π ΡΡΠΎΠΉ ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄Π»Π°Π³Π°Π΅ΡΡΡ ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΈΡ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° GLL, ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡΠ°Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ ΡΠΎΡΠΌΠ°Ρ Π³ΡΠ°ΠΌΠ°ΡΠΈΠΊ, ΠΊΠΎΡΠΎΡΡΠΉ ΡΠ΅ΡΠ½ΠΎ ΡΠ²ΡΠ·Π°Π½ Ρ EBNF: ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΡΠ΅ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ½ΠΎ-ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΠ΅ Π³ΡΠ°ΠΌΠΌΠ°ΡΠΊΠΈ. ΠΡΠΎΠΌΠ΅ ΡΠΎΠ³ΠΎ, Π±ΡΠ»ΠΎ ΠΏΠΎΠΊΠ°Π·Π°Π½ΠΎ, ΡΡΠΎ ΠΌΠΎΠ΄ΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΡΠ²Π΅Π»ΠΈΡΠΈΠ²Π°Π΅Ρ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΡ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° ΠΏΠΎ ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ Ρ ΠΎΡΠ½ΠΎΠ²Π°Π½Π½ΡΠΌ Π½Π° ΠΏΡΠ΅ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠΈ EBNF. ΠΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½Π½ΡΡ
ΠΈΡΡΠΎΡΠ½ΠΈΠΊΠΎΠ²: 32 ΠΠΎΡΠΎΡ
ΠΎΠ², Π. Π. ΠΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ° ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΡΡ
ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡΠ½ΠΎ-ΡΠ²ΠΎΠ±ΠΎΠ΄Π½ΡΡ
Π³ΡΠ°ΠΌΠΌΠ°ΡΠΈΠΊ Π² Π°Π»Π³ΠΎΡΠΈΡΠΌΠ΅ ΡΠΈΠ½ΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Generalised LL: Π²ΡΠΏΡΡΠΊΠ½Π°Ρ ΠΊΠ²Π°Π»ΠΈΡΠΈΠΊΠ°ΡΠΈΠΎΠ½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ°: Π·Π°ΡΠΈΡΠ΅Π½Π° 09.06.2017 / ΠΠΎΡΠΎΡ
ΠΎΠ² ΠΡΡΠ΅ΠΌ ΠΠ»Π°Π΄ΠΈΠΌΠΈΡΠΎΠ²ΠΈΡ. β Π‘ΠΠ±., 2017. β 37 Ρ. β ΠΠΈΠ±Π»ΠΈΠΎΠ³ΡΠ°ΡΠΈΡ: Ρ. 31β34.Gorokhov Artem Vladimirovich Support of extended context-free grammars in Generalised LL parsing algorithm Associate professor Semyon Grigorev. Mathematics & mechanics, software engineering department Parsing plays an important role in static program analysis: during this step a structural representation of code is created upon which further analysis is performed. Parser generator tools, being provided with syntax specification, automate parser development. Language documentation often acts as such specification. Documentation usually takes form of ambiguous grammar in Extended Backus-Naur Form which most parser generators fail to process. Automatic grammar transformation generally leads to parsing performance decrease. Some approaches support EBNF grammars natively, but they all fail to handle ambiguous grammars. On the other hand, Generalised LL parsing algorithm admits arbitrary context-free grammars and achieves good performance, but cannot handle EBNF grammars. The main contribution of this paper is a modification of GLL algorithm which can process grammars in a form which is closely related to EBNF (Extended Context-Free Grammar). We also show that the modification improves parsing performance as compared to grammar transformation-based approach. Sources cited: 32 Gorokhov, A. V. Support of extended context-free grammars in Generalised LL parsing algorithm: Graduation thesis: Defended 09.06.2017 / Gorokhov Artem Vladimirovich. β St. Petersburg., 2017. β 37 pp. β Bibliography: pp. 21-34
Unlinked rRNA genes are widespread among bacteria and archaea
International audienceRibosomes are essential to cellular life and the genes for their RNA components arethe most conserved and transcribed genes in Bacteria and Archaea. Ribosomal rRNA genes are typically organized into a single operon, an arrangement thought to facilitate gene regulation. In reality, some Bacteria and Archaea do not share this canonical rRNA arrangement - their 16S and 23S rRNA genes are separated across the genome and referred to as "unlinked". This rearrangement has previously been treated as an anomaly or a byproduct of genome degradation in intracellular bacteria. Here, we leverage complete genome and long-read metagenomic data to show that unlinked 16S and 23S rRNA genes are more common than previously thought. Unlinked rRNA genes occur in many phyla, most significantly within Deinococcus-Thermus, Chloroflexi, and Planctomycetes, and occur in differential frequencies across natural environments. We found that up to 41% of rRNA genes in soil were unlinked, in contrast to the human gut, where all sequenced rRNA genes were linked. The frequency of unlinked rRNA genes may reflect meaningful life history traits, as they tend to be associated with a mix of slow-growing free-living species and intracellular species. We speculate that unlinked rRNA genes may confer selective advantages in some environments, though the specific nature of these advantages remains undetermined and worthy of further investigation. More generally, the prevalence of unlinked rRNA genes in poorly-studied taxa serves as a reminder that paradigms derived from model organisms do not necessarily extend to the broader diversity of Bacteria and Archaea
Genomic evidence for sulfur intermediates as new biogeochemical hubs in a model aquatic microbial ecosystem
Background: The sulfur cycle encompasses a series of complex aerobic and anaerobic transformations of S-containing
molecules and plays a fundamental role in cellular and ecosystem-level processes, influencing biological carbon transfers and
other biogeochemical cycles. Despite their importance, the microbial communities and metabolic pathways involved in
these transformations remain poorly understood, especially for inorganic sulfur compounds of intermediate oxidation states
(thiosulfate, tetrathionate, sulfite, polysulfides). Isolated and highly stratified, the extreme geochemical and environmental
features of meromictic ice-capped Lake A, in the Canadian High Arctic, provided an ideal model ecosystem to resolve the
distribution and metabolism of aquatic sulfur cycling microorganisms along redox and salinity gradients.
Results: Applying complementary molecular approaches, we identified sharply contrasting microbial communities and
metabolic potentials among the markedly distinct water layers of Lake A, with similarities to diverse fresh, brackish and saline
water microbiomes. Sulfur cycling genes were abundant at all depths and covaried with bacterial abundance. Genes for
oxidative processes occurred in samples from the oxic freshwater layers, reductive reactions in the anoxic and sulfidic
bottom waters and genes for both transformations at the chemocline. Up to 154 different genomic bins with potential for
sulfur transformation were recovered, revealing a panoply of taxonomically diverse microorganisms with complex metabolic
pathways for biogeochemical sulfur reactions. Genes for the utilization of sulfur cycle intermediates were widespread
throughout the water column, co-occurring with sulfate reduction or sulfide oxidation pathways. The genomic bin
composition suggested that in addition to chemical oxidation, these intermediate sulfur compounds were likely produced
by the predominant sulfur chemo- and photo-oxidisers at the chemocline and by diverse microbial degraders of organic
sulfur molecules.
Conclusions: The Lake A microbial ecosystem provided an ideal opportunity to identify new features of the biogeochemical
sulfur cycle. Our detailed metagenomic analyses across the broad physico-chemical gradients of this permanently stratified
lake extend the known diversity of microorganisms involved in sulfur transformations over a wide range of environmental
conditions. The results indicate that sulfur cycle intermediates and organic sulfur molecules are major sources of electron
donors and acceptors for aquatic and sedimentary microbial communities in association with the classical sulfur cycl