8 research outputs found

    Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model

    Get PDF
    Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC

    메타게놈 데이터 분석을 위한 통계적 방법론 비교

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.A comparison study of statistical methods for the analysis of metagenome data Chanyoung Lee Interdisciplinary Program in Bioinformatics The Graduate School Seoul National University With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.1 Introduction 1 2 Material and Methods 6 2.1 Simulation materials (HMP) 6 2.2 Colorectal cancer data 8 2.3 Existing methods 11 2.4 Permutation logistic regression with centered log-ratio transformation (CLR Perm) 14 3 Simulation 17 3.1 Simulation model 17 3.2 Power and type I error rate 18 4 Results 22 4.1 Simulation results 22 4.2 Colorectal cancer data results 26 5 Discussion 33 Bibliography 36Maste

    The community ecology perspective of omics data

    Get PDF
    The measurement of uncharacterized pools of biological molecules through techniques such as metabarcoding, metagenomics, metatranscriptomics, metabolomics, and metaproteomics produces large, multivariate datasets. Analyses of these datasets have successfully been borrowed from community ecology to characterize the molecular diversity of samples (ɑ-diversity) and to assess how these profiles change in response to experimental treatments or across gradients (β-diversity). However, sample preparation and data collection methods generate biases and noise which confound molecular diversity estimates and require special attention. Here, we examine how technical biases and noise that are introduced into multivariate molecular data affect the estimation of the components of diversity (i.e., total number of different molecular species, or entities; total number of molecules; and the abundance distribution of molecular entities). We then explore under which conditions these biases affect the measurement of ɑ- and β-diversity and highlight how novel methods commonly used in community ecology can be adopted to improve the interpretation and integration of multivariate molecular data. Video Abstract

    The supragingival biofilm in early childhood caries: Clinical and laboratory protocols and bioinformatics pipelines supporting metagenomics, metatranscriptomics, and metabolomics studies of the oral microbiome

    Get PDF
    Early childhood caries (ECC) is a biofilm-mediated disease. Social, environmental, and behavioral determinants as well as innate susceptibility are major influences on its incidence; however, from a pathogenetic standpoint, the disease is defined and driven by oral dysbiosis. In other words, the disease occurs when the natural equilibrium between the host and its oral microbiome shifts toward states that promote demineralization at the biofilm-tooth surface interface. Thus, a comprehensive understanding of dental caries as a disease requires the characterization of both the composition and the function or metabolic activity of the supragingival biofilm according to well-defined clinical statuses. However, taxonomic and functional information of the supragingival biofilm is rarely available in clinical cohorts, and its collection presents unique challenges among very young children. This paper presents a protocol and pipelines available for the conduct of supragingival biofilm microbiome studies among children in the primary dentition, that has been designed in the context of a large-scale population-based genetic epidemiologic study of ECC. The protocol is being developed for the collection of two supragingival biofilm samples from the maxillary primary dentition, enabling downstream taxonomic (e.g., metagenomics) and functional (e.g., transcriptomics and metabolomics) analyses. The protocol is being implemented in the assembly of a pediatric precision medicine cohort comprising over 6000 participants to date, contributing social, environmental, behavioral, clinical, and biological data informing ECC and other oral health outcomes

    Monitoring the spread of antibiotic resistance in wastewater

    Get PDF
    BACKGROUND: Antibiotic resistant bacterial infections are causing a growing amount of morbidity and mortality. Effective control and prevention relies on good data on the current burden of antibiotic resistance (ABR). Traditional ABR surveillance from phenotypic, passive, hospital-based testing may not adequately represent the resistome of the general population. Wastewater metagenomics has been proposed as a new type of surveillance to overcome this limitation. It generates rich, quantitative information on the bacterial species and resistance genes of a whole community. Large wastewater metagenomic datasets are now available to monitor and explore drivers of ABR in the community. However, questions remain about how to collect, analyse, and interpret these novel datasets. In this thesis, I aimed to 1) address key unknowns in wastewater data, including sources of resistance, environmental resistance dynamics, and what statistical models describe the distribution of the data well, and 2) investigate global and local patterns in wastewater resistance and identify potential community and hospital drivers. METHODS: I used a systematic review to find evidence in the literature for dissemination of ABR from hospitals to wastewater. I next developed a compartmental transmission model to investigate environmental resistance dynamics and its impact on human ABR levels. I implemented a multi-response statistical model to correlate hospital-based surveillance (EARS-Net) data with resistance gene abundance in sewage samples from around the world analysed with metagenomics by the Global Sewage Surveillance Project. Finally, I used a paired sampling design and multiple statistical methods to compare the resistome of sewage from hospitals, communities, and wastewater treatment plants (WWTPS) in Scotland. I also investigated the links between ABR in humans and antibiotic consumption in the modelling and data analysis chapters. RESULTS: I found increasing evidence in primary research that resistant bacteria and resistance genes can be disseminated from hospital patients to wastewater and into natural water sources. Modelling the dynamics of ABR in an environmental reservoir indicated that the environment can theoretically influence human ABR levels as much as or more than an animal reservoir, and mitigate intervention impacts. Combining EARS-Net and sewage metagenomic data indicated that some types of ABR are positively correlated in sewage and hospitals (such as aminoglycosides), but many are not (such as vancomycin and aminopenicillins). The paired sampling study demonstrated that hospital and community sewage resistomes are distinct, and WWTPs mostly reflect community sewage resistomes. I found mixed evidence for an impact of antimicrobial consumption on human ABR levels. Overall, the impact of antibiotic consumption at the population level appears to be small in these datasets. CONCLUSIONS: Wastewater metagenomics is a valuable way of monitoring ABR in the community. It can indicate the composition of the reservoir of ABR in the general population and what drives it. However, hospital rather than mixed municipal effluent may need to be collected to monitor clinical resistance patterns. To make the most of this new source of data more flexible modelling frameworks that account for wastewater metagenomics specific factors such as high dimensionality and overdispersion. Comparing resistance patterns in hospitals to community sewage implied that patients and/or the hospital environment may present unique and strong selection pressures for resistance. Finally, we also show that differential antibiotic consumption alone cannot explain the observed patterns in resistance abundance on the national or international level
    corecore