49 research outputs found

    Recovering complete and draft population genomes from metagenome datasets

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

    Get PDF
    A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort

    Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The bacterial genus <it>Listeria </it>contains pathogenic and non-pathogenic species, including the pathogens <it>L. monocytogenes </it>and <it>L. ivanovii</it>, both of which carry homologous virulence gene clusters such as the <it>prfA </it>cluster and clusters of internalin genes. Initial evidence for multiple deletions of the <it>prfA </it>cluster during the evolution of <it>Listeria </it>indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains.</p> <p>Results</p> <p>To better understand genome evolution and evolution of virulence characteristics in <it>Listeria</it>, we used a next generation sequencing approach to generate draft genomes for seven strains representing <it>Listeria </it>species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main <it>Listeria </it>species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of <it>Listeria </it>species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic <it>Listeria </it>species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes.</p> <p>Conclusions</p> <p>Genome evolution in <it>Listeria </it>involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in <it>Listeria </it>did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus <it>Listeria </it>thus provides an example of a group of bacteria that appears to evolve through a loss of virulence rather than acquisition of virulence characteristics. While <it>Listeria </it>includes a number of species-like clades, many of these putative species include clades or strains with atypical virulence associated characteristics. This information will allow for the development of genetic and genomic criteria for pathogenic strains, including development of assays that specifically detect pathogenic <it>Listeria </it>strains.</p

    Multiple Data Analyses and Statistical Approaches for Analyzing Data from Metagenomic Studies and Clinical Trials

    Get PDF
    Metagenomics, also known as environmental genomics, is the study of the genomic content of a sample of organisms (microbes) obtained from a common habitat. Metagenomics and other β€œomics” disciplines have captured the attention of researchers for several decades. The effect of microbes in our body is a relevant concern for health studies. There are plenty of studies using metagenomics which examine microorganisms that inhabit niches in the human body, sometimes causing disease, and are often correlated with multiple treatment conditions. No matter from which environment it comes, the analyses are often aimed at determining either the presence or absence of specific species of interest in a given metagenome or comparing the biological diversity and the functional activity of a wider range of microorganisms within their communities. The importance increases for comparison within different environments such as multiple patients with different conditions, multiple drugs, and multiple time points of same treatment or same patient. Thus, no matter how many hypotheses we have, we need a good understanding of genomics, bioinformatics, and statistics to work together to analyze and interpret these datasets in a meaningful way. This chapter provides an overview of different data analyses and statistical approaches (with example scenarios) to analyze metagenomics samples from different medical projects or clinical trials

    The Complete Genome Sequence of Fibrobacter succinogenes S85 Reveals a Cellulolytic and Metabolic Specialist

    Get PDF
    Fibrobacter succinogenes is an important member of the rumen microbial community that converts plant biomass into nutrients usable by its host. This bacterium, which is also one of only two cultivated species in its phylum, is an efficient and prolific degrader of cellulose. Specifically, it has a particularly high activity against crystalline cellulose that requires close physical contact with this substrate. However, unlike other known cellulolytic microbes, it does not degrade cellulose using a cellulosome or by producing high extracellular titers of cellulase enzymes. To better understand the biology of F. succinogenes, we sequenced the genome of the type strain S85 to completion. A total of 3,085 open reading frames were predicted from its 3.84 Mbp genome. Analysis of sequences predicted to encode for carbohydrate-degrading enzymes revealed an unusually high number of genes that were classified into 49 different families of glycoside hydrolases, carbohydrate binding modules (CBMs), carbohydrate esterases, and polysaccharide lyases. Of the 31 identified cellulases, none contain CBMs in families 1, 2, and 3, typically associated with crystalline cellulose degradation. Polysaccharide hydrolysis and utilization assays showed that F. succinogenes was able to hydrolyze a number of polysaccharides, but could only utilize the hydrolytic products of cellulose. This suggests that F. succinogenes uses its array of hemicellulose-degrading enzymes to remove hemicelluloses to gain access to cellulose. This is reflected in its genome, as F. succinogenes lacks many of the genes necessary to transport and metabolize the hydrolytic products of non-cellulose polysaccharides. The F. succinogenes genome reveals a bacterium that specializes in cellulose as its sole energy source, and provides insight into a novel strategy for cellulose degradation

    The Complete Genome Sequence of Thermoproteus tenax: A Physiologically Versatile Member of the Crenarchaeota

    Get PDF
    Here, we report on the complete genome sequence of the hyperthermophilic Crenarchaeum Thermoproteus tenax (strain Kra 1, DSM 2078(T)) a type strain of the crenarchaeotal order Thermoproteales. Its circular 1.84-megabase genome harbors no extrachromosomal elements and 2,051 open reading frames are identified, covering 90.6% of the complete sequence, which represents a high coding density. Derived from the gene content, T. tenax is a representative member of the Crenarchaeota. The organism is strictly anaerobic and sulfur-dependent with optimal growth at 86 degrees C and pH 5.6. One particular feature is the great metabolic versatility, which is not accompanied by a distinct increase of genome size or information density as compared to other Crenarchaeota. T. tenax is able to grow chemolithoautotrophically (CO2/H-2) as well as chemoorganoheterotrophically in presence of various organic substrates. All pathways for synthesizing the 20 proteinogenic amino acids are present. In addition, two presumably complete gene sets for NADH:quinone oxidoreductase (complex I) were identified in the genome and there is evidence that either NADH or reduced ferredoxin might serve as electron donor. Beside the typical archaeal A(0)A(1)-ATP synthase, a membrane-bound pyrophosphatase is found, which might contribute to energy conservation. Surprisingly, all genes required for dissimilatory sulfate reduction are present, which is confirmed by growth experiments. Mentionable is furthermore, the presence of two proteins (ParA family ATPase, actin-like protein) that might be involved in cell division in Thermoproteales, where the ESCRT system is absent, and of genes involved in genetic competence (DprA, ComF) that is so far unique within Archaea

    Genome Characterization of the Oleaginous Fungus Mortierella alpina

    Get PDF
    Mortierella alpina is an oleaginous fungus which can produce lipids accounting for up to 50% of its dry weight in the form of triacylglycerols. It is used commercially for the production of arachidonic acid. Using a combination of high throughput sequencing and lipid profiling, we have assembled the M. alpina genome, mapped its lipogenesis pathway and determined its major lipid species. The 38.38 Mb M. alpina genome shows a high degree of gene duplications. Approximately 50% of its 12,796 gene models, and 60% of genes in the predicted lipogenesis pathway, belong to multigene families. Notably, M. alpina has 18 lipase genes, of which 11 contain the class 2 lipase domain and may share a similar function. M. alpina's fatty acid synthase is a single polypeptide containing all of the catalytic domains required for fatty acid synthesis from acetyl-CoA and malonyl-CoA, whereas in many fungi this enzyme is comprised of two polypeptides. Major lipids were profiled to confirm the products predicted in the lipogenesis pathway. M. alpina produces a complex mixture of glycerolipids, glycerophospholipids and sphingolipids. In contrast, only two major sterol lipids, desmosterol and 24(28)-methylene-cholesterol, were detected. Phylogenetic analysis based on genes involved in lipid metabolism suggests that oleaginous fungi may have acquired their lipogenic capacity during evolution after the divergence of Ascomycota, Basidiomycota, Chytridiomycota and Mucoromycota. Our study provides the first draft genome and comprehensive lipid profile for M. alpina, and lays the foundation for possible genetic engineering of M. alpina to produce higher levels and diverse contents of dietary lipids

    InterPro in 2019: improving coverage, classification and access to protein sequence annotations

    Get PDF
    The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities

    CRISPR - A widespread system that provides acquired resistance against phages in bacteria and archaea

    No full text
    Arrays of clustered, regularly interspaced short palindromic repeats (CRISPRs) are widespread in the genomes of many bacteria and almost all archaea. These arrays are composed of direct repeats that are separated by similarly sized non-repetitive spacers. CRISPR arrays, together with a group of associated proteins, confer resistance to phages, possibly by an RNA-interference-like mechanism. This Progress discusses the structure and function of this newly recognized antiviral mechanism
    corecore