29 research outputs found
Computational methods for tracking the evolution of complex bacterial communities
The focus of my PhD thesis was to study evolutionary aspects of host-associated microbial communities. In order to better understand these effects, I developed and applied computational methods to search for protein families that are under selection within metagenomes (Publication I) and applied them to various environments (see the articles “Structure and function of the bacterial root microbiota in wild and domesticated barley", and “Survival trade-offs in plant roots during colonization by closely related beneficial and pathogenic fungi”. One consistent finding of these studies was the high selective pressure acting on gene families associated with the bacterial defense system (the so-called CRISPR-Cas system) and families annotated as being related to bacteriophages. To study this CRISPR-phage relationship more closely, I systematically analysed CRISPR cassettes and CRISPR-related genes in samples from the Human Microbiome project (HMP) (Publication II). This resulted in one of the most comprehensive CRISPR collections to date. Further, we found novel sequence characteristics in the CRISPR loci and described the differences in the composition of CRISPR-associated genes in different body habitats and a potential relationship between the CRISPR defence system and the restriction modification system of bacteria. Furthermore, I performed a similar but less extensive search on metagenomic samples from infants: “Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life”. Next, I turned my focus to study microbiome evolution in a gnotobiotic mouse model, since this provided the opportunity to study bacteria evolution on an intermediate scale of complexity. I contributed to the development of the mouse model described in the Manuscript “Genome-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium” and a study comparing the stability of the community between animal facilities “Reproducible colonization of germ-free mice with the Oligo-Mouse-Microbiota in different animal facilities” and to a study focusing on the interaction network of this community. However, my main work with the OMM12 model has been to study community effects and evolution during repeated rounds of AB exposure (unpublished Publication III).Der Schwerpunkt meiner Doktorarbeit lag auf der Untersuchung evolutionärer Aspekte von wirtsassoziierten mikrobiellen Gemeinschaften. Um diese Effekte besser zu verstehen, habe ich computergestützte Methoden entwickelt und angewandt, um nach Proteinfamilien zu suchen, die in Metagenomen (Publikation I) der Selektion unterliegen, und sie auf verschiedene Umgebungen angewandt (siehe die Artikel “Structure and function of the bacterial root microbiota in wild and domesticated barley” und “Survival trade-offs in plant roots during colonization by closely related beneficial and pathogenic fungi”. Ein durchgängiges Ergebnis dieser Studien war der hohe Selektionsdruck, der auf Genfamilien wirkt, die mit dem bakteriellen Abwehrsystem (dem sogenannten CRISPR-Cas-System) und Familien, die als mit Bakteriophagen verwandt beschrieben werden, verbunden sind. Um diese CRISPR-Phagen-Beziehung genauer zu untersuchen, analysierte ich systematisch CRISPR-Kassetten und CRISPR-verwandte Gene in Proben aus dem Human Microbiome Project (HMP) (Publikation II). Dies führte zu einer der bisher umfassendsten CRISPR-Sammlungen. Darüber hinaus fanden wir neuartige Sequenzmerkmale in den CRISPR-Loci und beschrieben die Unterschiede in der Zusammensetzung von CRISPR-assoziierten Genen in verschiedenen Körperregionen sowie eine mögliche Beziehung zwischen dem CRISPR-Abwehrsystem und dem Restriktionsmodifikationssystem von Bakterien. Außerdem habe ich eine ähnliche, aber weniger umfangreiche Suche an metagenomischen Proben von Säuglingen durchgeführt: “Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life”. Als Nächstes konzentrierte ich mich auf die Untersuchung der Mikrobiomevolution in einem gnotobiotischen Mausmodell, da dies die Möglichkeit bot, die Evolution von Bakterien auf einer mittleren Komplexitätsebene zu untersuchen. Ich war an der Entwicklung des Mausmodells beteiligt, das im Manuskript "Genom-guided design of a defined mouse microbiota that confers colonization resistance against Salmonella enterica serovar Typhimurium" und eine Studie zum Vergleich der Stabilität der OMM12 Gemeinschaft zwischen Tierhaltungsanlagen “Reproducible colonization of germ-free mice with the Oligo-Mouse-Microbiota in different animal facilities” sowie eine Studie, die sich auf das Interaktionsnetzwerk dieser Gemeinschaft konzentriert. Meine Hauptarbeit mit dem OMM12-Modell bestand jedoch darin, die Auswirkungen und die Entwicklung der Gemeinschaft während wiederholter AB-Expositionen zu untersuchen (unveröffentlichtes Manuscript III)
In vitro interaction network of a synthetic gut bacterial community
A key challenge in microbiome research is to predict the functionality of microbial communities based on community membership and (meta)-genomic data. As central microbiota functions are determined by bacterial community networks, it is important to gain insight into the principles that govern bacteria-bacteria interactions. Here, we focused on the growth and metabolic interactions of the Oligo-Mouse-Microbiota (OMM12) synthetic bacterial community, which is increasingly used as a model system in gut microbiome research. Using a bottom-up approach, we uncovered the directionality of strain-strain interactions in mono- and pairwise co-culture experiments as well as in community batch culture. Metabolic network reconstruction in combination with metabolomics analysis of bacterial culture supernatants provided insights into the metabolic potential and activity of the individual community members. Thereby, we could show that the OMM12 interaction network is shaped by both exploitative and interference competition in vitro in nutrient-rich culture media and demonstrate how community structure can be shifted by changing the nutritional environment. In particular, Enterococcus faecalis KB1 was identified as an important driver of community composition by affecting the abundance of several other consortium members in vitro. As a result, this study gives fundamental insight into key drivers and mechanistic basis of the OMM12 interaction network in vitro, which serves as a knowledge base for future mechanistic in vivo studies
EDEN: evolutionary dynamics within environments.
Metagenomics revolutionized the field of microbial ecology, giving access to Gb-sized datasets of microbial communities under natural conditions. This enables fine-grained analyses of the functions of community members, studies of their association with phenotypes and environments, as well as of their microevolution and adaptation to changing environmental conditions. However, phylogenetic methods for studying adaptation and evolutionary dynamics are not able to cope with big data. EDEN is the first software for the rapid detection of protein families and regions under positive selection, as well as their associated biological processes, from meta- and pangenome data. It provides an interactive result visualization for detailed comparative analyses.
Availability and implementation:
EDEN is available as a Docker installation under the GPL 3.0 license, allowing its use on common operating systems, at http://www.github.com/hzi-bifo/eden
A self-supervised deep learning method for data-efficient training in genomics
Abstract Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models
Transcriptome-wide analysis uncovers the targets of the RNA-binding protein MSI2 and effects of MSI2's RNA-binding activity on IL-6 signaling
The RNA-binding protein Musashi 2 (MSI2) has emerged as an important regulator in cancer initiation, progression, and drug resistance. Translocations and deregulation of the MSI2 gene are diagnostic of certain cancers, including chronic myeloid leukemia (CML) with translocation t(7;17), acute myeloid leukemia (AML) with translocation t(10;17), and some cases of B-precursor acute lymphoblastic leukemia (pB-ALL). To better understand the function of MSI2 in leukemia, the mRNA targets that are bound and regulated by MSI2 and their MSI2-binding motifs need to be identified. To this end, using photoactivatable ribonucleoside cross-linking and immunoprecipitation (PAR-CLIP) and the Multiple EM for Motif Elicitation (MEME) analysis tool, here we identified MSI2’s mRNA targets and the consensus RNA-recognition element (RRE) motif recognized by MSI2 (UUAG). Of note, MSI2 knockdown altered the expression of several genes with roles in eukaryotic initiation factor 2 (eIF2), hepatocyte growth factor (HGF), and epidermal growth factor (EGF) signaling pathways. We also show that MSI2 regulates classic interleukin-6 (IL-6) signaling by promoting the degradation of the mRNA of IL-6 signal transducer (IL6ST or GP130), which, in turn, affected the phosphorylation statuses of signal transducer and activator of transcription 3 (STAT3) and the mitogen-activated protein kinase ERK. In summary, we have identified multiple MSI2-regulated mRNAs and provided evidence that MSI2 controls IL6ST activity that control oncogenic signaling networks. Our findings may help inform strategies for unraveling the role of MSI2 in leukemia to pave the way for the development of targeted therapies
Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses
Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a ‘G.G’ context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https://github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.Deutsches Zentrum für Infektionsforschun
Distinct Pattern of Microgliosis in the Olfactory Bulb of Neurodegenerative Proteinopathies
The olfactory bulb (OB) shows early neuropathological hallmarks in numerous neurodegenerative diseases, for example, in Alzheimer’s disease (AD) and Parkinson’s disease (PD). The glomerular and granular cell layer of the OB is characterized by preserved cellular plasticity in the adult brain. In turn, alterations of this cellular plasticity are related to neuroinflammation such as microglia activation, implicated in the pathogenesis of AD and PD, as well as frontotemporal lobe degeneration (FTLD). To determine microglia proliferation and activation we analyzed ionized calcium binding adaptor molecule 1 (Iba1) expressing microglia in the glomerular and granular cell layer, and the olfactory tract of the OB from patients with AD, PD dementia/dementia with Lewy bodies (PDD/DLB), and FTLD compared to age-matched controls. The number of Iba1 and CD68 positive microglia associated with enlarged amoeboid microglia was increased particularly in AD, to a lesser extent in FTLD and PDD/DLB as well, while the proportion of proliferating microglia was not altered. In addition, cells expressing the immature neuronal marker polysialylated neural cell adhesion molecule (PSA-NCAM) were increased in the glomerular layer of PDD/DLB and FTLD cases only. These findings provide novel and detailed insights into differential levels of microglia activation in the OB of neurodegenerative diseases
Optimized model architectures for deep learning on genomic data
Abstract The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines
Reproducible Colonization of Germ-Free Mice With the Oligo-Mouse-Microbiota in Different Animal Facilities.
The Oligo-Mouse-Microbiota (OMM12) is a recently developed synthetic bacterial community for functional microbiome research in mouse models (Brugiroux et al., 2016). To date, the OMM12 model has been established in several germ-free mouse facilities world-wide and is employed to address a growing variety of research questions related to infection biology, mucosal immunology, microbial ecology and host-microbiome metabolic cross-talk. The OMM12 consists of 12 sequenced and publically available strains isolated from mice, representing five bacterial phyla that are naturally abundant in the murine gastrointestinal tract (Lagkouvardos et al., 2016). Under germ-free conditions, the OMM12 colonizes mice stably over multiple generations. Here, we investigated whether stably colonized OMM12 mouse lines could be reproducibly established in different animal facilities. Germ-free C57Bl/6J mice were inoculated with a frozen mixture of the OMM12 strains. Within 2 weeks after application, the OMM12 community reached the same stable composition in all facilities, as determined by fecal microbiome analysis. We show that a second application of the OMM12 strains after 72 h leads to a more stable community composition than a single application. The availability of such protocols for reliable de novo generation of gnotobiotic rodents will certainly contribute to increasing experimental reproducibility in biomedical research