3,801 research outputs found

    Measuring Data Completeness for Microbial Genomics Database

    Get PDF
    Poor quality data such as data with missing values (or records)cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this paper, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to supportPBC measurements in practice. This paper explores the need of PBC in the microbial genomics where real sample data sets retrieved from a microbial database called Comprehensive Microbial Resources are used(CMR)

    Hydrocarbon seepage in the deep seabed links subsurface and seafloor biospheres

    Get PDF
    © The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Chakraborty, A., Ruff, S. E., Dong, X., Ellefson, E. D., Li, C., Brooks, J. M., McBee, J., Bernard, B. B., & Hubert, C. R. J. Hydrocarbon seepage in the deep seabed links subsurface and seafloor biospheres. Proceedings of the National Academy of Sciences of the United States of America, 117(20), (2020): 11029-11037, doi: 10.1073/pnas.2002289117.Marine cold seeps transmit fluids between the subseafloor and seafloor biospheres through upward migration of hydrocarbons that originate in deep sediment layers. It remains unclear how geofluids influence the composition of the seabed microbiome and if they transport deep subsurface life up to the surface. Here we analyzed 172 marine surficial sediments from the deep-water Eastern Gulf of Mexico to assess whether hydrocarbon fluid migration is a mechanism for upward microbial dispersal. While 132 of these sediments contained migrated liquid hydrocarbons, evidence of continuous advective transport of thermogenic alkane gases was observed in 11 sediments. Gas seeps harbored distinct microbial communities featuring bacteria and archaea that are well-known inhabitants of deep biosphere sediments. Specifically, 25 distinct sequence variants within the uncultivated bacterial phyla Atribacteria and Aminicenantes and the archaeal order Thermoprofundales occurred in significantly greater relative sequence abundance along with well-known seep-colonizing members of the bacterial genus Sulfurovum, in the gas-positive sediments. Metabolic predictions guided by metagenome-assembled genomes suggested these organisms are anaerobic heterotrophs capable of nonrespiratory breakdown of organic matter, likely enabling them to inhabit energy-limited deep subseafloor ecosystems. These results point to petroleum geofluids as a vector for the advection-assisted upward dispersal of deep biosphere microbes from subsurface to surface environments, shaping the microbiome of cold seep sediments and providing a general mechanism for the maintenance of microbial diversity in the deep sea.We wish to thank Jody Sandel as well as the crew of R/V GeoExplorer for collection of piston cores, onboard core processing, sample preservation, and shipment. Cynthia Kwan and Oliver Horanszky are thanked for assistance with amplicon library preparation. We also wish to thank Jayne Rattray, Daniel Gittins, and Marc Strous for valuable discussions and suggestions, and Rhonda Clark for research support. Collaborations with Andy Mort from the Geological Survey of Canada, and Richard Hatton from Geoscience Wales are also gratefully acknowledged. This work was financially supported by a Mitacs Elevate Postdoctoral Fellowship awarded to A.C.; an Alberta Innovates-Technology Futures/Eyes High Postdoctoral Fellowship to S.E.R.; and a Natural Sciences and Engineering Research Council Strategic Project Grant, a Genome Canada Genomics Applications Partnership Program grant, a Canada Foundation for Innovation grant (CFI-JELF 33752) for instrumentation, and Campus Alberta Innovates Program Chair funding to C.R.J.H

    Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes

    Get PDF
    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This “microbial dark matter” represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum “Calescamantes” (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs.This work was supported by NASA exobiology grant EXO-NNX11AR78G, U.S. National Science Foundation grant OISE 0968421, and U.S. Department of Energy grant DE-EE-0000716. B.P.H. acknowledges generous support from Greg Fullmer through the UNLV Foundation, and W.S. acknowledges Northern Illinois University for funding. B.P.H and S.K.M. acknowledge support from an Amazon Web Services Education Research Grant award. The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231. This article is made openly accessible in part by an award from the Northern Illinois University Libraries’ Open Access Publishing Fund

    Large-scale sequencing of SARS-CoV-2 genomes from one region allows detailed epidemiology and enables local outbreak management.

    Get PDF
    The COVID-19 pandemic has spread rapidly throughout the world. In the UK, the initial peak was in April 2020; in the county of Norfolk (UK) and surrounding areas, which has a stable, low-density population, over 3200 cases were reported between March and August 2020. As part of the activities of the national COVID-19 Genomics Consortium (COG-UK) we undertook whole genome sequencing of the SARS-CoV-2 genomes present in positive clinical samples from the Norfolk region. These samples were collected by four major hospitals, multiple minor hospitals, care facilities and community organizations within Norfolk and surrounding areas. We combined clinical metadata with the sequencing data from regional SARS-CoV-2 genomes to understand the origins, genetic variation, transmission and expansion (spread) of the virus within the region and provide context nationally. Data were fed back into the national effort for pandemic management, whilst simultaneously being used to assist local outbreak analyses. Overall, 1565 positive samples (172 per 100 000 population) from 1376 cases were evaluated; for 140 cases between two and six samples were available providing longitudinal data. This represented 42.6 % of all positive samples identified by hospital testing in the region and encompassed those with clinical need, and health and care workers and their families. In total, 1035 cases had genome sequences of sufficient quality to provide phylogenetic lineages. These genomes belonged to 26 distinct global lineages, indicating that there were multiple separate introductions into the region. Furthermore, 100 genetically distinct UK lineages were detected demonstrating local evolution, at a rate of ~2 SNPs per month, and multiple co-occurring lineages as the pandemic progressed. Our analysis: identified a discrete sublineage associated with six care facilities; found no evidence of reinfection in longitudinal samples; ruled out a nosocomial outbreak; identified 16 lineages in key workers which were not in patients, indicating infection control measures were effective; and found the D614G spike protein mutation which is linked to increased transmissibility dominates the samples and rapidly confirmed relatedness of cases in an outbreak at a food processing facility. The large-scale genome sequencing of SARS-CoV-2-positive samples has provided valuable additional data for public health epidemiology in the Norfolk region, and will continue to help identify and untangle hidden transmission chains as the pandemic evolves.The sequencing costs were funded by the COVID-19 Genomics UK (COG-UK) Consortium which is supported by funding from the Medical Research Council (MRC) part of UK Research and Innovation (UKRI), the National Institute of Health Research (NIHR) and Genome Research Limited, operating as the Wellcome Sanger Institute

    ZOMBIES IN BACTERIAL GENOMES: IDENTIFICATION AND ANALYSIS OF PREVIOUSLY VIRULENT PHAGE

    Get PDF
    Bacteriophage (or ‘phage’) are viruses that infect and reproduce within their bacterial hosts. They have a major global impact on bacterial evolution and ecology, and might influence the pathogenicity of their host bacterium by providing virulence factors. Phage can either be described as “virulent” or “temperate”; the distinguishing feature between the two is their method of replication. This study sought to identify phage sequences within bacterial host genomes and determine the life cycle of the phage, exploring whether there is a connection between defective phage and previously virulent phage. It would normally be expected that any phage sequences identified within a bacterial host would have a temperate life cycle, since only temperate phage enter the lysogenic cycle and insert their DNA into the host as a ‘prophage,’ while virulent phage replicate via the lytic cycle, in which phage DNA replicates separately from that of the host’s and infected cells are lysed. Defective phage–‘zombies’ in bacterial genomes–are dormant phage that have become inactive through mutational decay or some other process. It is possible that some of these defective phage are in fact previously virulent phage that have become accidentally inserted within the host genome. This study detected phage within bacterial genomes using the prophage identification tools PHAge Search Tool (PHAST) and Prophage Finder. Identified sequences were categorized as ‘intact,’ ‘questionable,’ or ‘incomplete’; questionable and incomplete phage were classified as defective. The lifestyles of the uncovered phage sequences were then determined using PHACTS; six phage were identified as possibly virulent. The life cycles of the phage were further analyzed by assessing the genomic signature distances (GSD) and codon adaptation indexes (CAI) for each phage. Three phage were shown to have a GSD consistent with a virulent life cycle, and the CAI values of four phage corresponded with that of virulent phage. Although previous studies have indicated that some virulent phage may have a temperate lineage, identifying prophage as previously virulent is a novel finding. This has implications for our understanding of phage life cycles and the infection process, as it challenges the idea that only temperate phage insert their DNA into the host genome

    scMAR-Seq: a novel workflow for targeted single-cell genomics of microorganisms using radioactive labeling

    Get PDF
    Current methods for the identification of specific microorganisms based on an in situ metabolism are often hampered by insufficient sensitivity and habitat complexity. Here, we present a novel approach for identifying and sequencing single microbial cells metabolizing a specific organic compound with high sensitivity and without prior knowledge of the microbial community. The workflow consists of labeling individual cells with a [14^{14}C] substrate based on their metabolic activity, followed by encapsulating cells in alginate with nuclear emulsion by using microfluidics. We here adapted the concept of microautoradiography to visually distinguish between encapsulated labeled and non-labeled cells, which are then sorted via flow cytometry for single cell genomics. As a proof-of-concept, we labeled, separated, lysed, and sequenced single cells of the benzene degrader Pseudomonas veronii from mock microbial communities. The cells of P. veronii were isolated with 100% specificity. Single-cell microautoradiography and genome sequencing is an innovative method for elucidating microbial identity, activity, and function in diverse habitats, contributing to elucidate novel taxa and genes with potential for biotechnological applications such as bioremediation

    Adaptive genomic structural variation in the grape powdery mildew pathogen, Erysiphe necator.

    Get PDF
    BackgroundPowdery mildew, caused by the obligate biotrophic fungus Erysiphe necator, is an economically important disease of grapevines worldwide. Large quantities of fungicides are used for its control, accelerating the incidence of fungicide-resistance. Copy number variations (CNVs) are unbalanced changes in the structure of the genome that have been associated with complex traits. In addition to providing the first description of the large and highly repetitive genome of E. necator, this study describes the impact of genomic structural variation on fungicide resistance in Erysiphe necator.ResultsA shotgun approach was applied to sequence and assemble the genome of five E. necator isolates, and RNA-seq and comparative genomics were used to predict and annotate protein-coding genes. Our results show that the E. necator genome is exceptionally large and repetitive and suggest that transposable elements are responsible for genome expansion. Frequent structural variations were found between isolates and included copy number variation in EnCYP51, the target of the commonly used sterol demethylase inhibitor (DMI) fungicides. A panel of 89 additional E. necator isolates collected from diverse vineyard sites was screened for copy number variation in the EnCYP51 gene and for presence/absence of a point mutation (Y136F) known to result in higher fungicide tolerance. We show that an increase in EnCYP51 copy number is significantly more likely to be detected in isolates collected from fungicide-treated vineyards. Increased EnCYP51 copy numbers were detected with the Y136F allele, suggesting that an increase in copy number becomes advantageous only after the fungicide-tolerant allele is acquired. We also show that EnCYP51 copy number influences expression in a gene-dose dependent manner and correlates with fungal growth in the presence of a DMI fungicide.ConclusionsTaken together our results show that CNV can be adaptive in the development of resistance to fungicides by providing increasing quantitative protection in a gene-dosage dependent manner. The results of this work not only demonstrate the effectiveness of using genomics to dissect complex traits in organisms with very limited molecular information, but also may have broader implications for understanding genomic dynamics in response to strong selective pressure in other pathogens with similar genome architectures
    corecore