8 research outputs found
Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model
Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC
메타게놈 데이터 분석을 위한 통계적 방법론 비교
학위논문 (석사)-- 서울대학교 대학원 : 자연과학대학 협동과정 생물정보학전공, 2018. 2. 박태성.A comparison study of statistical methods for the analysis of
metagenome data
Chanyoung Lee
Interdisciplinary Program in Bioinformatics
The Graduate School
Seoul National University
With the advent of next-generation sequencing (NGS) technology, sequencing microorganisms from varied samples facilitates association analysis between feature and environment. Several statistical methods have been proposed for analyzing metagenome data such as Metastats, metagenomeSeq, ZIBSeq, ANCOM, edgeR, and DESeq2. Each method has assumed its own specific distribution and model assumptions. While there have been some comparative studies on these methods, the comparison is rather limited and the results have been varied depending on how to generate simulation datasets. In this study, we systematically investigate the properties of these statistical methods for finding differentially abundant features (DAF). In addition, centered log-ratio transformation and permutation logistic regression model (CLR Perm) were applied to metagenome data. We compare their performances using simulation data generated from the Human Microbiome Project (HMP). We first assessed the type I error rate of each method over different levels of sparsity. CLR Perm, metagenomeSeq and ANCOM methods yielded well preserved type I error rates regardless of sparsity. In the power comparison study, CLR Perm showed the highest power among the methods preserving type I error. Furthermore, we applied the methods to real data on colorectal cancer (CRC) to compare our results with existing taxonomic markers of CRC. In conclusion, we recommend using a combination of CLR Perm and metagenomeSeq for the analysis of metagenome data because there are differences in the list of significant taxa discovered by CLR Perm and metagenomeSeq.1 Introduction 1
2 Material and Methods 6
2.1 Simulation materials (HMP) 6
2.2 Colorectal cancer data 8
2.3 Existing methods 11
2.4 Permutation logistic regression with centered log-ratio transformation (CLR Perm) 14
3 Simulation 17
3.1 Simulation model 17
3.2 Power and type I error rate 18
4 Results 22
4.1 Simulation results 22
4.2 Colorectal cancer data results 26
5 Discussion 33
Bibliography 36Maste
The community ecology perspective of omics data
The measurement of uncharacterized pools of biological molecules through techniques such as metabarcoding, metagenomics, metatranscriptomics, metabolomics, and metaproteomics produces large, multivariate datasets. Analyses of these datasets have successfully been borrowed from community ecology to characterize the molecular diversity of samples (ɑ-diversity) and to assess how these profiles change in response to experimental treatments or across gradients (β-diversity). However, sample preparation and data collection methods generate biases and noise which confound molecular diversity estimates and require special attention. Here, we examine how technical biases and noise that are introduced into multivariate molecular data affect the estimation of the components of diversity (i.e., total number of different molecular species, or entities; total number of molecules; and the abundance distribution of molecular entities). We then explore under which conditions these biases affect the measurement of ɑ- and β-diversity and highlight how novel methods commonly used in community ecology can be adopted to improve the interpretation and integration of multivariate molecular data. Video Abstract
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
The supragingival biofilm in early childhood caries: Clinical and laboratory protocols and bioinformatics pipelines supporting metagenomics, metatranscriptomics, and metabolomics studies of the oral microbiome
Early childhood caries (ECC) is a biofilm-mediated disease. Social, environmental, and behavioral determinants as well as innate susceptibility are major influences on its incidence; however, from a pathogenetic standpoint, the disease is defined and driven by oral dysbiosis. In other words, the disease occurs when the natural equilibrium between the host and its oral microbiome shifts toward states that promote demineralization at the biofilm-tooth surface interface. Thus, a comprehensive understanding of dental caries as a disease requires the characterization of both the composition and the function or metabolic activity of the supragingival biofilm according to well-defined clinical statuses. However, taxonomic and functional information of the supragingival biofilm is rarely available in clinical cohorts, and its collection presents unique challenges among very young children. This paper presents a protocol and pipelines available for the conduct of supragingival biofilm microbiome studies among children in the primary dentition, that has been designed in the context of a large-scale population-based genetic epidemiologic study of ECC. The protocol is being developed for the collection of two supragingival biofilm samples from the maxillary primary dentition, enabling downstream taxonomic (e.g., metagenomics) and functional (e.g., transcriptomics and metabolomics) analyses. The protocol is being implemented in the assembly of a pediatric precision medicine cohort comprising over 6000 participants to date, contributing social, environmental, behavioral, clinical, and biological data informing ECC and other oral health outcomes
Monitoring the spread of antibiotic resistance in wastewater
BACKGROUND: Antibiotic resistant bacterial infections are causing a growing amount of morbidity and mortality. Effective control and prevention relies on good data on the current burden of antibiotic resistance (ABR). Traditional ABR surveillance from phenotypic, passive, hospital-based testing may not adequately represent the resistome of the general population. Wastewater metagenomics has been proposed as a new type of surveillance to overcome this limitation. It generates rich, quantitative information on the bacterial species and resistance genes of a whole community. Large wastewater metagenomic datasets are now available to monitor and explore drivers of ABR in the community. However, questions remain about how to collect, analyse, and interpret these novel datasets. In this thesis, I aimed to 1) address key unknowns in wastewater data, including sources of resistance, environmental resistance dynamics, and what statistical models describe the distribution of the data well, and 2) investigate global and local patterns in wastewater resistance and identify potential community and hospital drivers.
METHODS: I used a systematic review to find evidence in the literature for dissemination of ABR from hospitals to wastewater. I next developed a compartmental transmission model to investigate environmental resistance dynamics and its impact on human ABR levels. I implemented a multi-response statistical model to correlate hospital-based surveillance (EARS-Net) data with resistance gene abundance in sewage samples from around the world analysed with metagenomics by the Global Sewage Surveillance Project. Finally, I used a paired sampling design and multiple statistical methods to compare the resistome of sewage from hospitals, communities, and wastewater treatment plants (WWTPS) in Scotland. I also investigated the links between ABR in humans and antibiotic consumption in the modelling and data analysis chapters.
RESULTS: I found increasing evidence in primary research that resistant bacteria and resistance genes can be disseminated from hospital patients to wastewater and into natural water sources. Modelling the dynamics of ABR in an environmental reservoir indicated that the environment can theoretically influence human ABR levels as much as or more than an animal reservoir, and mitigate intervention impacts. Combining EARS-Net and sewage metagenomic data indicated that some types of ABR are positively correlated in sewage and hospitals (such as aminoglycosides), but many are not (such as vancomycin and aminopenicillins). The paired sampling study demonstrated that hospital and community sewage resistomes are distinct, and WWTPs mostly reflect community sewage resistomes. I found mixed evidence for an impact of antimicrobial consumption on human ABR levels. Overall, the impact of antibiotic consumption at the population level appears to be small in these datasets.
CONCLUSIONS: Wastewater metagenomics is a valuable way of monitoring ABR in the community. It can indicate the composition of the reservoir of ABR in the general population and what drives it. However, hospital rather than mixed municipal effluent may need to be collected to monitor clinical resistance patterns. To make the most of this new source of data more flexible modelling frameworks that account for wastewater metagenomics specific factors such as high dimensionality and overdispersion. Comparing resistance patterns in hospitals to community sewage implied that patients and/or the hospital environment may present unique and strong selection pressures for resistance. Finally, we also show that differential antibiotic consumption alone cannot explain the observed patterns in resistance abundance on the national or international level