2,206 research outputs found

    How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner

    Get PDF
    As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as ‘microbial dark matter’ (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on ‘microbial dark matter’

    The complex hybrid origins of the root knot nematodes revealed through comparative genomics

    Get PDF
    Meloidogyne root knot nematodes (RKN) can infect most of the world's agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species' genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species' convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally

    Ecological and Genomic Attributes of Novel Bacterial Taxa That Thrive in Subsurface Soil Horizons.

    Get PDF
    While most bacterial and archaeal taxa living in surface soils remain undescribed, this problem is exacerbated in deeper soils, owing to the unique oligotrophic conditions found in the subsurface. Additionally, previous studies of soil microbiomes have focused almost exclusively on surface soils, even though the microbes living in deeper soils also play critical roles in a wide range of biogeochemical processes. We examined soils collected from 20 distinct profiles across the United States to characterize the bacterial and archaeal communities that live in subsurface soils and to determine whether there are consistent changes in soil microbial communities with depth across a wide range of soil and environmental conditions. We found that bacterial and archaeal diversity generally decreased with depth, as did the degree of similarity of microbial communities to those found in surface horizons. We observed five phyla that consistently increased in relative abundance with depth across our soil profiles: Chloroflexi, Nitrospirae, Euryarchaeota, and candidate phyla GAL15 and Dormibacteraeota (formerly AD3). Leveraging the unusually high abundance of Dormibacteraeota at depth, we assembled genomes representative of this candidate phylum and identified traits that are likely to be beneficial in low-nutrient environments, including the synthesis and storage of carbohydrates, the potential to use carbon monoxide (CO) as a supplemental energy source, and the ability to form spores. Together these attributes likely allow members of the candidate phylum Dormibacteraeota to flourish in deeper soils and provide insight into the survival and growth strategies employed by the microbes that thrive in oligotrophic soil environments.IMPORTANCE Soil profiles are rarely homogeneous. Resource availability and microbial abundances typically decrease with soil depth, but microbes found in deeper horizons are still important components of terrestrial ecosystems. By studying 20 soil profiles across the United States, we documented consistent changes in soil bacterial and archaeal communities with depth. Deeper soils harbored communities distinct from those of the more commonly studied surface horizons. Most notably, we found that the candidate phylum Dormibacteraeota (formerly AD3) was often dominant in subsurface soils, and we used genomes from uncultivated members of this group to identify why these taxa are able to thrive in such resource-limited environments. Simply digging deeper into soil can reveal a surprising number of novel microbes with unique adaptations to oligotrophic subsurface conditions

    A Draft of the Genome of the Gulf Coast tick, \u3ci\u3eAmblyomma maculatum\u3c/i\u3e

    Get PDF
    The Gulf Coast tick, Amblyomma maculatum, inhabits the Southeastern states of the USA bordering the Gulf of Mexico, Mexico, and other Central and South American countries. More recently, its U.S. range has extended West to Arizona and Northeast to New York state and Connecticut. It is a vector of Rickettsia parkeri and Hepatozoon americanum. This tick species has become a model to study tick/Rickettsia interactions. To increase our knowledge of the basic biology of A. maculatum we report here a draft genome of this tick and an extensive functional classification of its proteome. The DNA from a single male tick was used as a genomic source, and a 10X genomics protocol determined 28,460 scaffolds having equal or more than 10 Kb, totaling 1.98 Gb. The N50 scaffold size was 19,849 Kb. The BRAKER pipeline was used to find the protein-coding gene boundaries on the assembled A. maculatum genome, discovering 237,921 CDS. After trimming and classifying the transposable elements, bacterial contaminants, and truncated genes, a set of 25,702 were annotated and classified as the core gene products. A BUSCO analysis revealed 83.4% complete BUSCOs. A hyperlinked spreadsheet is provided, allowing browsing of the individual gene products and their matches to several databases

    Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

    Get PDF
    High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/

    EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>ESTs and full-length cDNAs represent an invaluable source of evidence for inferring reliable gene structures and discovering potential alternative splicing events. In newly sequenced genomes, these tasks may not be practicable owing to the lack of appropriate training sets. However, when expression data are available, they can be used to build EST clusters related to specific genomic transcribed <it>loci</it>. Common strategies recently employed to this end are based on sequence similarity between transcripts and can lead, in specific conditions, to inconsistent and erroneous clustering. In order to improve the cluster building and facilitate all downstream annotation analyses, we developed a simple genome-based methodology to generate gene-oriented clusters of ESTs when a genomic sequence and a pool of related expressed sequences are provided. Our procedure has been implemented in the software EasyCluster and takes into account the spliced nature of ESTs after an <it>ad hoc </it>genomic mapping.</p> <p>Methods</p> <p>EasyCluster uses the well-known GMAP program in order to perform a very quick EST-to-genome mapping in addition to the detection of reliable splice sites. Given a genomic sequence and a pool of ESTs/FL-cDNAs, EasyCluster starts building genomic and EST local databases and runs GMAP. Subsequently, it parses results creating an initial collection of pseudo-clusters by grouping ESTs according to the overlap of their genomic coordinates on the same strand. In the final step, EasyCluster refines the clustering by again running GMAP on each pseudo-cluster and groups together ESTs sharing at least one splice site.</p> <p>Results</p> <p>The higher accuracy of EasyCluster with respect to other clustering tools has been verified by means of a manually cured benchmark of human EST clusters. Additional datasets including the Unigene cluster Hs.122986 and ESTs related to the human <it>HOXA </it>gene family have also been used to demonstrate the better clustering capability of EasyCluster over current genome-based web service tools such as ASmodeler and BIPASS. EasyCluster has also been used to provide a first compilation of gene-oriented clusters in the <it>Ricinus communis </it>oilseed plant for which no Unigene clusters are yet available, as well as an evaluation of the alternative splicing in this plant species.</p

    Ecology, not host phylogeny, shapes the oral microbiome in closely related species

    Get PDF
    Host-associated microbiomes are essential for a multitude of biological processes. Placed at the contact zone between external and internal environments, the little-studied oral microbiome has important roles in host physiology and health. Here, we investigate the roles of host evolutionary relationships and ecology in shaping the oral microbiome in three closely related gorilla subspecies (mountain, Grauer's, and western lowland gorillas) using shotgun metagenomics of 46 museum-preserved dental calculus samples. We find that the oral microbiomes of mountain gorillas are functionally and taxonomically distinct from the other two subspecies, despite close evolutionary relationships and geographic proximity with Grauer's gorillas. Grauer's gorillas show intermediate bacterial taxonomic and functional, and dietary profiles. Altitudinal differences in gorilla subspecies ranges appear to explain these patterns, suggesting a close connection between dental calculus microbiomes and the environment, likely mediated through diet. This is further supported by the presence of gorilla subspecies-specific phyllosphere/rhizosphere taxa in the oral microbiome. Mountain gorillas show a high abundance of nitrate-reducing oral taxa, which may promote adaptation to a high-altitude lifestyle by modulating blood pressure. Our results suggest that ecology, rather than evolutionary relationships and geographic distribution, shape the oral microbiome in these closely related species
    • …
    corecore