176 research outputs found
A probabilistic model to recover individual genomes from metagenomes
Dröge J, Schönhuth A, McHardy AC. A probabilistic model to recover individual genomes from metagenomes. PeerJ Computer Science. 2017;3: e117.Shotgun metagenomics of microbial communities reveal information about strains of relevance for applications in medicine, biotechnology and ecology. Recovering their genomes is a crucial but very challenging step due to the complexity of the underlying biological system and technical factors. Microbial communities are heterogeneous, with oftentimes hundreds of present genomes deriving from different species or strains, all at varying abundances and with different degrees of similarity to each other and reference data. We present a versatile probabilistic model for genome recovery and analysis, which aggregates three types of information that are commonly used for genome recovery from metagenomes. As potential applications we showcase metagenome contig classification, genome sample enrichment and genome bin comparisons. The open source implementation MGLEX is available via the Python Package Index and on GitHub and can be embedded into metagenome analysis workflows and programs.</jats:p
Inferring functional modules of protein families with probabilistic topic models
<p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p
Assessing taxonomic metagenome profilers with OPAL
Meyer F, Bremges A, Belmann P, Janssen S, McHardy AC, Koslicki D. Assessing taxonomic metagenome profilers with OPAL. Genome biology. 2019;20(1): 51.The explosive growth in taxonomic metagenome profiling methods over the past years has created a need for systematic comparisons using relevant performance criteria. The Open-community Profiling Assessment tooL (OPAL) implements commonly used performance metrics, including those of the first challenge of the initiative for the Critical Assessment of Metagenome Interpretation (CAMI), together with convenient visualizations. In addition, we perform in-depth performance comparisons with seven profilers on datasets of CAMI and the Human Microbiome Project. OPAL is freely available at https://github.com/CAMI-challenge/OPAL
GISMO—gene identification using a support vector machine for ORF classification
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license
Bioboxes: standardised containers for interchangeable bioinformatics software
Belmann P, Dröge J, Bremges A, McHardy AC, Sczyrba A, Barton MD. Bioboxes: standardised containers for interchangeable bioinformatics software. GigaScience. 2015;4(1): 47.Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable
The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences
Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen
Microbiota and Host Nutrition across Plant and Animal Kingdoms
Plants and animals each have evolved specialized organs dedicated to nutrient acquisition, and these harbor specific bacterial communities that extend the host's metabolic repertoire. Similar forces driving microbial community establishment in the gut and plant roots include diet/soil-type, host genotype, and immune system as well as microbe-microbe interactions. Here we show that there is no overlap of abundant bacterial taxa between the microbiotas of the mammalian gut and plant roots, whereas taxa overlap does exist between fish gut and plant root communities. A comparison of root and gut microbiota composition in multiple host species belonging to the same evolutionary lineage reveals host phylogenetic signals in both eukaryotic kingdoms. The reasons underlying striking differences in microbiota composition in independently evolved, yet functionally related, organs in plants and animals remain unclear but might include differences in start inoculum and niche-specific factors such as oxygen levels, temperature, pH, and organic carbon availability
High-Throughput miRNA and mRNA Sequencing of Paired Colorectal Normal, Tumor and Metastasis Tissues and Bioinformatic Modeling of miRNA-1 Therapeutic Applications
MiRNAs are discussed as diagnostic and therapeutic molecules. However,
effective miRNA drug treatments with miRNAs are, so far, hampered by the
complexity of the miRNA networks. To identify potential miRNA drugs in
colorectal cancer, we profiled miRNA and mRNA expression in matching normal,
tumor and metastasis tissues of eight patients by Illumina sequencing. We
validated six miRNAs in a large tissue screen containing 16 additional tumor
entities and identified miRNA-1, miRNA-129, miRNA-497 and miRNA-215 as
constantly de-regulated within the majority of cancers. Of these, we
investigated miRNA-1 as representative in a systems-biology simulation of
cellular cancer models implemented in PyBioS and assessed the effects of
depletion as well as overexpression in terms of miRNA-1 as a potential
treatment option. In this system, miRNA-1 treatment reverted the disease
phenotype with different effectiveness among the patients. Scoring the gene
expression changes obtained through mRNA-Seq from the same patients we show
that the combination of deep sequencing and systems biological modeling can
help to identify patient-specific responses to miRNA treatments. We present
this data as guideline for future pre-clinical assessments of new and
personalized therapeutic options
- …