45 research outputs found

    Decoding Microbial Genomes: Novel User-Friendly Tools Applied to Fermented Foods

    Get PDF
    Over the past two decades, the cost of DNA sequencing per base has significantly outpaced Moore's law. Many organizations and research groups have exploited this trend and generated large amounts of genomic data and made it possible to tackle new research questions. This growth also brings challenges, including the need for faster algorithms, more efficient ways to visualize and explore data, more automatized data processing, and systematic data management. For more than a century, Agroscope collects lactic acid bacteria (LAB) extracted from the Swiss dairy environment. Today, the collection comprises more than 10’000 strains and so far, for about 15% of the strains the genome was sequenced. The over-arching goal of this thesis is to find new ways of exploiting this genetic potential to design new fermented food products including potential additional health benefits, and to understand the underlying mechanisms. One compound with potential health benefits is indole. Previous experiments have shown that indole compounds modulate the gut immune system via the aryl hydrocarbon receptor (AhR). Our objective was to create a yoghurt enriched in indole metabolites through fermentation, and then to examine whether maternal consumption of this yoghurt would enhance gut immune system maturation in germ-free mice. To reduce the number of strains to test, I developed comparative genomics tools to pre-select strains from the strain collection. This led to the successful development of a yoghurt with significantly increased AhR activation activity. In germ-free mice, we could show the expected effect. Based on these comparative genomics tools, I developed the software OpenGenomeBrowser to enable biologists, who know their organisms of interest in great detail, to efficiently explore the genomic data by themselves, without bioinformatics skills or the need for a middleman bioinformatician. The foundation of OpenGenomeBrowser is a simple system for transparent data management of microbial genomes which makes the automation of common bioinformatics workflows possible. In addition, I built a user-friendly website based on modern web technologies to facilitate common bioinformatics workflows. Because of OpenGenomeBrowser's solid foundation, it is the first software of its kind that can be self-hosted and is dataset-independent, making it potentially useful for many similar genome datasets. During the project, we measured thousands of metabolites in yoghurts made using different strains. However, we experienced that no existing tools could adequately connect such a high-dimensional phenotypic dataset to the genomic information, i.e., presence-absence of orthogenes. Finding high-confidence causative links between these datasets is challenging because of the properties of microbial genomes. For instance, clonal reproduction leads to genome-wide linkage disequilibrium, which prohibits the use of techniques developed for human genome-wide association studies (hGWAS). To this end, I developed Scoary2, a complete rewrite and extension of the original microbial GWAS (mGWAS) software Scoary. The key improvements include an implementation of the core algorithm that is orders of magnitude faster and an interactive web-app that enables efficient data exploration of the output, which is crucial given the size of the dataset. With this software, we discovered two previously uncharacterized genes involved in the carnitine metabolism

    Focus: A Graph Approach for Data-Mining and Domain-Specific Assembly of Next Generation Sequencing Data

    Get PDF
    Next Generation Sequencing (NGS) has emerged as a key technology leading to revolutionary breakthroughs in numerous biomedical research areas. These technologies produce millions to billions of short DNA reads that represent a small fraction of the original target DNA sequence. These short reads contain little information individually but are produced at a high coverage of the original sequence such that many reads overlap. Overlap relationships allow for the reads to be linearly ordered and merged by computational programs called assemblers into long stretches of contiguous sequence called contigs that can be used for research applications. Although the assembly of the reads produced by NGS remains a difficult task, it is the process of extracting useful knowledge from these relatively short sequences that has become one of the most exciting and challenging problems in Bioinformatics. The assembly of short reads is an aggregative process where critical information is lost as reads are merged into contigs. In addition, the assembly process is treated as a black box, with generic assembler tools that do not adapt to input data set characteristics. Finally, as NGS data throughput continues to increase, there is an increasing need for smart parallel assembler implementations. In this dissertation, a new assembly approach called Focus is proposed. Unlike previous assemblers, Focus relies on a novel hybrid graph constructed from multiple graphs at different levels of granularity to represent the assembly problem, facilitating information capture and dynamic adjustment to input data set characteristics. This work is composed of four specific aims: 1) The implementation of a robust assembly and analysis tool built on the hybrid graph platform 2) The development and application of graph mining to extract biologically relevant features in NGS data sets 3) The integration of domain specific knowledge to improve the assembly and analysis process. 4) The construction of smart parallel computing approaches, including the application of energy-aware computing for NGS assembly and knowledge integration to improve algorithm performance. In conclusion, this dissertation presents a complete parallel assembler called Focus that is capable of extracting biologically relevant features directly from its hybrid assembly graph

    Persistence of birth mode-dependent effects on gut microbiome composition, immune system stimulation and antimicrobial resistance during the first year of life

    Get PDF
    Caesarean section delivery (CSD) disrupts mother-to-neonate transmission of specific microbial strains and functional repertoires as well as linked immune system priming. Here we investigate whether differences in microbiome composition and impacts on host physiology persist at 1 year of age. We perform high-resolution, quantitative metagenomic analyses of the gut microbiomes of infants born by vaginal delivery (VD) or by CSD, from immediately after birth through to 1 year of life. Several microbial populations show distinct enrichments in CSD-born infants at 1 year of age including strains of Bacteroides caccae, Bifidobacterium bifidum and Ruminococcus gnavus, whereas others are present at higher levels in the VD group including Faecalibacterium prausnitizii, Bifidobacterium breve and Bifidobacterium kashiwanohense. The stimulation of healthy donor-derived primary human immune cells with LPS isolated from neonatal stool samples results in higher levels of tumour necrosis factor alpha (TNF-α) in the case of CSD extracts over time, compared to extracts from VD infants for which no such changes were observed during the first year of life. Functional analyses of the VD metagenomes at 1 year of age demonstrate a significant increase in the biosynthesis of the natural antibiotics, carbapenem and phenazine. Concurrently, we find antimicrobial resistance (AMR) genes against several classes of antibiotics in both VD and CSD. The abundance of AMR genes against synthetic (including semi-synthetic) agents such as phenicol, pleuromutilin and diaminopyrimidine are increased in CSD children at day 5 after birth. In addition, we find that mobile genetic elements, including phages, encode AMR genes such as glycopeptide, diaminopyrimidine and multidrug resistance genes. Our results demonstrate persistent effects at 1 year of life resulting from birth mode-dependent differences in earliest gut microbiome colonisation

    Metagenomics analysis of disease-related human gut microbiota

    Get PDF
    The human gut microbiota have been linked with various pathological disorders. Yet, our understanding of the underlying mechanisms is still limited by the inconsistent results of different publications and the inherent complexity. These separate studies and incomparable data sets missed the forest for the trees, thus encouraging us to carry out meta-analysis of human gut microbiome regarding different kinds of diseases and dip into the question about what kinds of human gut microbial community are healthy. 1. This dissertation underpins the consistent discipline behind disease-related dysbiosis by conducting a pan-microbiome analysis, which annotated and analyzed the microbiome contigs and genes identified from raw reads of whole genome sequencing (WGS) data of human gut. Consistent pattern shift was discovered in the microbial mutually dependent community, which revealed that the microbial members in diseases are more competitive while less cooperative than health, remarkably driven by the 20-times increase of competitive pairs between potential pathogens and 10-times decrease of cooperative pairs between non pathogens. Additionally, taking all the microbiota in the same community as a ‘super organism’, our mathematical model of gene-gene interaction network revealed the significance of cell motility, though it was not a dominant functional category. This part of work answered the question about how the ecological niches of gut modulate human health in a systematic matter. 2. This dissertation discovered some inflammation and cancer related genera increase in the advanced aging individuals while some beneficial genera are lost, and proved the existence of aging progression of human gut microbiota, by applying an unsupervised machine learning algorithm to recapitulate the underlying aging progression of microbial community from hosts in different age groups. Aging process captures many facets of biological variation of the human body, which leads to functional decline and increased incidence of infection in gut of elderly people. Different from diseases, the aging transformation is a continuous progress. We obtained raw 16S rRNA sequencing data of subjects ranging from newborns to centenarians from a previous study, and summarized the data into a relative abundance matrix of genera in all the samples. Without using the age information of samples, we applied multivariate unsupervised analysis, which revealed the existence of a continuous aging progression of human gut microbiota along with the host aging process. The identified genera associated to this aging process are meaningful for designing probiotics to maintain the gut microbiota to resemble a young age, which hopefully will lead to positive impact on human health, especially for individuals in advanced age groups. 3. This dissertation develops a machine learning model LightCUD for disease discrimination based on human gut microbiome, which was designed for discriminating UC and CD from non-IBD colitis. Using a set of WGS data from 349 human gut microbiota samples with two types of IBD and healthy controls, we assembled and aligned WGS short reads to obtain feature profiles. Owing to the well-designed feature selection and machine learning algorithms comparison, LightCUD outperforms other pilot studies. LightCUD was implemented in Python and packaged free for installation with customized databases. With WGS data or 16S rDNA sequencing data of gut microbiota samples as the input, LightCUD can discriminate IBD from healthy controls with high accuracy and further identify the specific type of IBD. The executable program LightCUD is released as open source at the webpage http://cqb.pku.edu.cn/zhulab/lightcud/. 4. This dissertation constructed a comprehensive database, named DREEM, of DiseaseRElatEd Marker genes in human gut microbiome, which retrieves a large scale WGS data released in GenBank and EMBL. Short reads with the size of 18.63T consisting of 1,729 samples are processed with unified procedure, involving the state-of-the-art bioinformatics tools and well-designed statistical analysis, and covering six types of pathological conditions, i.e., T2D, Crohn’s diseases, ulcerative colitis, liver cirrhosis, symptomatic atherosclerosis and obesity. Furthermore, the database annotates the disease-related marker genes functionally and taxonomically. DREEM contains 1,953,046 disease-related marker genes and 5100 core genes. The database is accessible at http://cqb.pku.edu.cn/ZhuLab/DREEM. This dissertation conducted a pan-microbiome analysis integrating multiple diseases, revealed the aging progression of human gut microbiota, released the tool LightCUD for discriminating diseases based on human gut microbiome and constructed a disease-related marker gene database within human gut microbiota.Ph.D

    Artificial intelligence and inflammatory bowel disease: practicalities and future prospects

    Get PDF
    Artificial intelligence (AI) is an emerging technology predicted to have significant applications in healthcare. This review highlights AI applications that impact the patient journey in inflammatory bowel disease (IBD), from genomics to endoscopic applications in disease classification, stratification and self-monitoring to risk stratification for personalised management. We discuss the practical AI applications currently in use while giving a balanced view of concerns and pitfalls and look to the future with the potential of where AI can provide significant value to the care of the patient with IBD

    Exploring a multiple causative bacterial aetiology for paediatric Inflammatory Bowel Disease: Finding a needle in a stack of needles and the point of sifting through the faecies

    Full text link
    The Inflammatory bowel diseases (IBD), include Crohn’s Disease (CD) and Ulcerative Colitis (UC), are characterised by chronic relapsing inflammation of the gastrointestinal tract. The accepted disease aetiology is the homeostatic relationship between intestinal bacteria and intestinal immunity breaks down, resulting in chronic relapsing inflammation that can irreversibly destroy the intestinal mucosa. Contributing factors for disease are thought to be host genetic factors, environmental factors and the host gut microbiome. However, disease heterogeneity and the complex characteristics of disease have made it difficult to precisely define disease causation. The aim of this thesis was to investigate characteristics of the gut microbiome at diagnosis of pediatric IBD in an attempt to describe features that may explain disease causation. Treatment naïve children with gastrointestinal symptoms undergoing investigation by colonoscopy were recruited. Prior to colonoscopy, faecal samples were collected. During colonoscopy, mucosal washings and biopsies were collected at multiple sites along the large intestine. Participants were characterised as IBD or with a Functional Gastrointestinal Disorder (FGID) using standard guidelines. Faecal samples were also collected from healthy children (HC) with no gastrointestinal symptoms. Microbial composition was investigated by 16S Ribosomal subunit (16S rRNA) analysis and whole genomic sequencing (WGS). Initial findings were that multiple sampling (mucosal washings, biopsies and feces) was more informative than a single sample and this approach was used in subsequent investigations. Initial comparisons of CD and UC showed greater variability in the gut community structure in CD. Although both UC and CD vary from FGID and HC, with UC microbial profiles more closely resembling FGID. Bacteria that accounted for most difference between inflamed and non-inflamed sites were Bacteroides, Akkermansia, Faecalibacterium, Eschricia, Odoribacter and Parabacteroides. Overall, the microbial gene functions and pathways between disease and non-disease groups, and between sample types, were similar. Combined analysis of 16S rRNA and WGS indicated some overall changes may be associated with inflammation, however individual patients appear to have unique microbial characteristics associated with disease. Therefore, patients with similar disease phenotypes may have different microbial drivers of disease. The outcomes of this thesis suggests that a personalised approach to investigating and treating disease may be warranted

    Towards more complete metagenomic analyses through circularized genomes and conjugative elements

    Get PDF
    Advancements in sequencing technologies have revolutionized biological sciences and led to the emergence of a number of fields of research. One such field of research is metagenomics, which is the study of the genomic content of complex communities of bacteria. The goal of this thesis was to contribute computational methodology that can maximize the data generated in these studies and to apply these protocols human and environmental metagenomic samples. Standard metagenomic analyses include a step for binning of assembled contigs, which has previously been shown to exclude mobile genetic elements, and I demonstrated that this phenomenon extends to all conjugative elements, which are a subset of mobile genetic elements. I proposed two separate methodologies that could detect contigs that are potential conjugative elements: a curated set of profile hidden Markov models that are very efficient to run, or annotation using the full UniRef90 database, a slower but more sensitive method. I then applied this framework to a large population-based cohort and to a study examining the association of the maternal human gut microbiota and the development of spina bifida. Broadly, the composition and abundances of conjugative elements were discriminatory between the age and geographic cohorts. In the spina bifida cohort, there was an enrichment of Campylobacter hominis and a conjugative element belonging to Campylobacter hominis, which was excluded from the metagenomic bins. Next, I characterized a novel species belonging to the recently discovered manganese-oxidizing genus Manganitrophus growing on oil refinery carbon filters. I successfully circularized the genomes of three strains and got quality assemblies for the remaining two samples. Furthermore, I identified a previously uncharacterized conjugative plasmid belonging to the species using my framework developed in chapter 2. Finally, I developed an assembly pipeline to perform a secondary assembly on binned assemblies using long reads. The secondary assemblies yielded a number of additional circularized sequences that would be useful as scaffolds in future metatranscriptomic, variation analysis, and community dynamic studies. The methodologies and applications in this thesis provide a framework for more complete metagenomic analyses going forward that will aid in our understanding of microbial ecology

    Validation and development of sequence-based tools to analyse the human gut virome

    Get PDF
    The gut microbiome is a complex community of microorganisms that interacts closely with the human host and is believed to play an important role in the maintenance of human health. The viral component of this community is referred to as the human gut virome and is dominated by bacteriophage. Bacteriophage are central to microbial ecosystems by facilitating nutrient turnover, horizontal gene transfer and driving bacterial diversity. In this way the gut virome is believed to closely interact with the human host by shaping the composition and function of the gut microbiome. However, the gut virome also represents one of the biggest gaps in our understanding of the microbiome as it is dominated by unknown bacteriophage targeting unknown bacterial hosts and with uncharacterised downstream functions. These challenges mean that virome research relies heavily on sequence-based approaches and metagenomics to identify compositional patterns and targets for future characterisation. A typical virome study involves physical and chemical separation of individual virions from the cellular components of the microbiome and the contents of the faecal, luminal or mucosal sample from which it came. A viral metagenome is then generated by extracting virome DNA and/or RNA for sequencing on a given platform. These sequencing reads are then quality filtered and assembled to reconstruct the viral genomes in the original sample. The abundance of these assemblies is then estimated by aligning the sequencing reads and performing statistical analysis. However, each step in a virome analysis pipeline has the potential to distort the final viral community and given the unknown nature of the virome, this distortion is difficult to identify and characterise. As a result, conclusions are often drawn from virome studies without fully appreciating the impact of the analysis methods on the findings. This thesis examines the major steps in sequence-based virome analysis pipelines, highlighting how choices made at each step of an analysis protocol can impact the final conclusions drawn from a study. In doing so, we have changed our perspective of the human gut virome and challenged previous assumptions. Chapter One discusses the current understanding of the virome field, giving particular attention to how the analysis methods and challenges affect our view of the virome. In Chapter Two, we focus on the assembly step of virome analysis pipelines. This step is of particular importance to virome studies, as an assembler’s ability to recover viral sequences can ultimately determine the amount of sequence information used in a that study. We compared all short-read assembly programs used in virome studies to date, across mock communities, simulated and real datasets. We found that not all assemblers are equal, and choice of assembler can drastically affect the conclusions that can be drawn from a virome study. These findings call the comparability of different virome studies into question and would suggest that previous virome studies would benefit from reanalysis using improved assembly methods and re-examination of the conclusions drawn. As discussed, the human gut virome is dominated by “viral dark matter”; those sequences which do not share homology to reference databases. However, the majority of what is currently known about the virome in human health and disease is based on the minor fraction of viral sequences collated in these databases. This presents a serious gap in our understanding and was the primary focus of Chapter Three. We reanalysed a keystone inflammatory bowel disease (IBD) dataset, which had formed the foundation of much of what we knew about the virome in IBD. We developed a new approach to analysing the virome beyond the identifiable minority and by doing so, changed our understanding of the virome in IBD significantly. In the final chapter, we directed our attention to possibly the most important aspect of a sequence-based study, the sequencing approach itself. This step bridges the gap between the biological information in a virome and the digital information that is analysed. As with all steps in a virome analysis pipeline, this has serious implications for the final conclusions of the study. We described the use of long-read sequencing in the human gut virome and the benefits and challenges which are associated with this technology. We also found the ability of amplified short-read sequencing libraries to represent the gut virome was limited, but that alternative library preparation methods and long-read sequencing platforms may be able to address these limitations. These findings imply that much of what we know about that human gut virome may be linked to sequencing performance, rather than the biology of the community itself. These three major aspects of virome analysis pipelines highlight the importance of considering the impact of the analysis approach when interpreting the results of virome data and complex biological systems in general

    Gut Microbiota and Metabolic Disorders

    Get PDF
    Obesity and its co-morbidities, such as metabolic syndrome (MetS), non-alcoholic fatty liver disease (NAFLD) and type 2 diabetes, have increased over the last few decades like an epidemic. So far the mechanisms of many metabolic diseases are not known in detail and currently there are not enough effective means to prevent and treat them. Several recent studies have shown that the unbalanced gut microbiota composition (GMC) and activity have an influence on the fat accumulation in the body. Further, it seems that the GMC of obese individuals differs from the lean. The aim of this study was to investigate whether there are differences between the GMC of metabolically impaired overweight/obese (MetS group), metabolically healthy overweight/obese and normal-weight individuals. In addition, the mechanisms by which the gut bacteria as well as their specific structures, such as flagellin (FLG) that stimulates the Toll-like receptor 5 (TLR5) affect metabolism, were investigated both in vivo and in vitro in human adipocytes and hepatocytes. The results of this study show that the abundance of certain gram-positive bacteria belonging to the Clostridial cluster XIV was higher in the MetS group subjects compared to their metabolically healthy overweight/obese and lean counterparts. Metabolically impaired subjects tended to also have a greater abundance of potentionally inflammatory Enterobacteria in their gut and thus seemed to have aberrant GMC. In addition, it was found that subjects with a high hepatic fat content (HHFC group) had less Faecalibacterium prausnitzii in their gut than individuals with low hepatic fat content. Further gene expression analysis revealed that the HHFC group also had increased inflammation cascades in their adipose tissue. Additionally, metabolically impaired individuals displayed an increased expression of FLG-recognizing TLR5 in adipose tissue, and the TLR5 expression levels associated positively both with liver fat content and insulin resistance in humans. These changes in the adipose tissue may further contribute to the impaired metabolism observed, such as insulin resistance and dyslipidemia. In vitro -studies showed that the FLG-induced TLR5 activation in adipocytes enhanced the hepatic fat accumulation by decreasing insulin signaling and mitochondrial functions and increasing triglyceride synthesis due to increased glycerol secretion from adipocytes. In conclusion, the findings of this study suggest that it may be possible that the novel prevention and personalized treatment strategies based on GM modulation will succesfully be developed for obesity and metabolic disorders in the future.Siirretty Doriast

    Enteric disorders at weaning: age, amoxicillin administration and Enterotoxigenic Escherichia coli infection affecting the gut microbiota of piglets.

    Get PDF
    To investigate aspects related to weaning diarrhoea, two studies have been performed. The aim of the first study was to evaluate the impact of weaning age on gut microbiota in piglets at different weaning ages. 48 piglets were divided into four groups weaned at 14, 21, 28 and 42 days old (late weaning). In each group, faecal bacteria composition was assessed by sequencing the 16S rRNA gene on the weaning day, 7 days post-weaning and at 60 days of age. Our results showed that late weaning increases the gut microbiota diversity including a higher abundance of Faecalibacterium prausnitzii. The pre-weaning gut microbiota composition conferred by a late weaning at 42 days of age could enhance gut health in piglets. The aim of the second study was to evaluate the effects of the host-genotype and different routes of amoxicillin administration on the presence of diarrhoea and the microbiota composition, during a natural infection by multi-resistant ETEC strains in weaned piglets. 71 piglets were divided into three groups: two groups differing by amoxicillin administration routes – parenteral or oral, and a control group without antibiotics. Our results confirmed the MUC4 and FUT1 as host genetic markers for the susceptibility to ETEC infections. Moreover, amoxicillin treatment may produce adverse outcomes on pig health in course of multi-resistant ETEC infection and this effect is stronger when the antibiotic is orally administered than parenterally. Both studies highlighted the importance of alternative control measures related to farm management in controlling weaning diarrhoea. With a need to limit the use of antibiotics, selection of resistant genotypes, next-generation probiotics supplementation in feed, and correct procedures of weaning age, should be considered in farm management practices in order to preserve a balanced and stable gut microbiota and consequently reduce occurrence of diarrhoea at weaning
    corecore