95 research outputs found
Computational Metagenomics: Network, Classification and Assembly
Due to the rapid advance of DNA sequencing technologies in recent 10 years, large amounts of short DNA reads can be obtained quickly and cheaply. For example, a single Illumina HiSeq machine can produce several terabytes of data sets within a week. Metagenomics is a new scientific field that involves the analysis of genomic DNA sequences obtained directly from the environment, enabling studies of novel microbial systems. Metagenomics was made possible from high-throughput sequencing technologies. The analysis of the resulting data requires sophisticated computational analyses and data mining. In clinical settings, a fundamental goal of metagenomics is to help people diagnose and cure disease in clinical settings. One major bottleneck so far is how to analyze the huge noisy data sets quickly and precisely. My PhD research focuses on developing algorithms and tools to tackle these challenging and interesting computational problems.
From the functional perspective, a metagenomic sample can be represented as a weighted metabolic network, in which the nodes are molecules, edges are enzymes encoded by genes, and the weights can be considered as the number of organisms providing the functions. One goal of functional comparison between metagenomic samples is to find differentially abundant metabolic subnetworks between two groups under comparison. We have developed a statistical network analysis tool - MetaPath, which uses a greedy search algorithm to find maximum weight subnetwork and a nonparametric permutation test to measure the statistical significance. Unlike previous approaches, MetaPath explicitly searches for significant subnetwork in the global network, enabling us to detect signatures at a finer level. In addition, we developed statistical methods that take into account the topology of the network when testing the significance of the subnetworks.
Another computational problem involves classifying anonymous DNA sequences obtained from metagenomic samples. There are several challenges here: (1) The classification labels follow a hierarchical tree structure, in which the leaves are most specific, and the internal nodes are more general. How can we classify novel sequences that do not belong to leaf categories (species) but belong to internal groups (e.g., phylum)? (2) For each classification how can we compute a confidence score, such that the users have a tradeoff between sensitivity and specificity? (3) How can we analyze billions of data items quickly? We have developed a novel hierarchical classifier (MetaPhyler) for the classification of anonymous DNA reads. Through simulation, MetaPhyler models the distribution of pairwise similarities within different hierarchical groups with nonparametric density estimation. The confidence score is computed by the ratio of likelihood function. For a query DNA sequence with arbitrary length, its similarity can be calculated through linear approximation. Through benchmark comparison, we have shown that MetaPhyler is significantly faster and more accurate than previous tools.
DNA sequencing machines can only produce very short strings (e.g., 100bp) relative to the size of a genome (e.g., a typical bacterial genome is 5Mbp). One of the most challenging computational tasks is the assembly of millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. In this project, we have developed a comparative metagenomic assembler (MetaCompass), which utilizes the genomes that have already been sequenced previously, and produces long contigs through read mapping (alignment) and assembly. Given the availability of thousands of existing bacteria genomes, for a particular sample, MetaCompass first chooses a best subset as reference based on the taxonomic composition. Then, the reads are aligned against these genomes using MUMmer-map or Bowtie2. Afterwards, we use a greedy algorithm of the minimum set-covering problem to build long contigs, and the consensus sequences are computed by the majority rule. We also propose an iterative approach to improve the performance. Finally, MetaCompass has been successfully evaluated and tested on over 20 terabytes of metagenomic data sets generated from the Human Microbiome Project.
In addition, to facilitate the identification and characterization of antibiotic resistance genes, we have created Antibiotic Resistance Genes Database (ARDB), which provides a centralized compendium of information on antibiotic resistance. Furthermore, we have applied our tools to the analysis of a novel oral microbiome data set, and have discovered interesting functional mechanisms and ecological changes underlying the transition from health to periodontal disease of human mouth at a system level
Recommended from our members
Biology in the information age : computational methods to understand and engineer the central dogma
The rise of NGS, big data, and ‘-omics’ has ushered biology into a new age, with the power to fundamentally change how research is approached. Rather than using a singular hypothesis, we can now incorporate more data-driven methods that drive new biological insights, explain emergent biological phenomena, and/or derive novel functionality. This thesis highlights the changing role of computation to both learn more about biological systems as well as leveraging data-intensive computational techniques to create new proteins and enzymes.
The ability for computational approaches to drive biological understanding is presented in three studies. First, the laboratory evolution of DNA polymerases, the workhorses of replication, towards novel functionality is explored. In the three polymerases created, modeling and large scale approaches are used to demonstrate the additional capability of each new enzyme. Next, two independent studies in the genomic adaptations needed for E. coli cells to adapt a 21st amino acid (selenocysteine and nitrotyrosine) are presented. Next generation sequencing is used to better understand the mechanisms of how cells accommodate the increased fitness burden placed by an orthogonal translation system. Lastly, community-wide changes in the oral microbiome are studied in the progression towards periodontitis, with implications towards potential therapeutic targets.
The capstone of this thesis leverages big data techniques to engineer novel proteins, the chief functional units within cells. Protein structural data is implemented into a convolutional neural network to associate amino acids with neighboring chemical microenvironments at state-of-the-art accuracy. This algorithm enables identification of gain-of-function mutations, and subsequent experiments confirm substantive improvements in stability-associated phenotypes in vivo across three diverse proteins. This work is the first demonstration of using deep learning to empirically improve protein function and opens a new avenue for protein engineering.Cellular and Molecular Biolog
Recommended from our members
Chapter 12: Human Microbiome Analysis
Humans are essentially sterile during gestation, but during and after birth, every body surface, including the skin, mouth, and gut, becomes host to an enormous variety of microbes, bacterial, archaeal, fungal, and viral. Under normal circumstances, these microbes help us to digest our food and to maintain our immune systems, but dysfunction of the human microbiota has been linked to conditions ranging from inflammatory bowel disease to antibiotic-resistant infections. Modern high-throughput sequencing and bioinformatic tools provide a powerful means of understanding the contribution of the human microbiome to health and its potential as a target for therapeutic interventions. This chapter will first discuss the historical origins of microbiome studies and methods for determining the ecological diversity of a microbial community. Next, it will introduce shotgun sequencing technologies such as metagenomics and metatranscriptomics, the computational challenges and methods associated with these data, and how they enable microbiome analysis. Finally, it will conclude with examples of the functional genomics of the human microbiome and its influences upon health and disease
Profiling the Oral Microbiome and Plasma Biochemistry of Obese Hyperglycemic Subjects in Qatar
The present study is designed to compare demographic characteristics, plasma biochemistry, and the oral microbiome in obese ( = 37) and lean control ( = 36) subjects enrolled at Qatar Biobank, Qatar. Plasma hormones, enzymes, and lipid profiles were analyzed at Hamad Medical Cooperation Diagnostic Laboratory. Saliva microbiome characterization was carried out by 16S rRNA amplicon sequencing using Illumina MiSeq platform. Obese subjects had higher testosterone and sex hormone-binding globulin (SHBG) concentrations compared to the control group. A negative association between BMI and testosterone ( < 0.001, r = -0.64) and SHBG ( < 0.001, r = -0.34) was observed. Irrespective of the study groups, the oral microbiome was predominantly occupied by , , and species. A generalized linear model revealed that the Firmicutes/Bacteroidetes ratio (2.25 ± 1.83 vs. 1.76 ± 0.58; corrected -value = 0.04) was higher, and phylum Fusobacteria concentration (4.5 ± 3.0 vs. 6.2 ± 4.3; corrected -value = 0.05) was low in the obese group compared with the control group. However, no differences in microbiome diversity were observed between the two groups as evaluated by alpha (Kruskal-Wallis ≥ 0.78) and beta (PERMANOVA = 0.37) diversity indexes. Certain bacterial phyla (Acidobacteria, Bacteroidetes, Fusobacteria, Proteobacteria, Spirochaetes, and Firmicutes/Bacteroidetes) were positively associated ( = 0.05, r ≤ +0.5) with estradiol, fast food consumption, creatinine, breastfed during infancy, triglycerides, and thyroid-stimulating hormone concentrations. In conclusion, no differences in oral microbiome diversity were observed between the studied groups. However, the Firmicutes/Bacteroidetes ratio, a recognized obesogenic microbiome trait, was higher in the obese subjects. Further studies are warranted to confirm these findings in a larger cohort.Qatar National Research Fun
Synergies of systems biology and synthetic biology in human microbiome studies
A number of studies have shown that the microbial communities of the human body are integral for the maintenance of human health. Advances in next generation sequencing have enabled rapid and large-scale quantification of the composition of microbial communities in health and disease. Microorganisms mediate diverse host responses including metabolic pathways and immune responses. Using a system biology approach to further understand the underlying alterations of the microbiota in physiological and pathological states, can help reveal potential novel therapeutic and diagnostic interventions within the field of synthetic biology. Tools such as biosensors, memory arrays and engineered bacteria can rewire the microbiome environment. In this article, were view the computational tools used to study microbiome communities and the current limitations of these methods. We evaluate how genome-scale metabolic models can advance our understanding of the microbe-microbe and microbe-host interactions. Moreover, we present how synergies between these system biology approaches and synthetic biology can be harnessed in human microbiome studies to improve future therapeutics and diagnostics and highlight important knowledge gaps for future research in these rapidly evolving fields
Homemade blenderized tube feeding improves gut microbiome communities in children with enteral nutrition
Enteral nutrition for children is supplied through nasogastric or gastrostomy tubes. Diet not only influences nutritional intake but also interacts with the composition and function of the gut microbiota. Homemade blenderized tube feeding has been administered to children receiving enteral nutrition, in addition to ready-made tube feeding. The purpose of this study was to evaluate the oral/gut microbial communities in children receiving enteral nutrition with or without homemade blenderized tube feeding. Among a total of 30 children, 6 receiving mainly ready-made tube feeding (RTF) and 5 receiving mainly homemade blenderized tube feeding (HBTF) were analyzed in this study. Oral and gut microbiota community profiles were evaluated through 16S rRNA sequencing of saliva and fecal samples. The α-diversity representing the number of observed features, Shannon index, and Chao1 in the gut were significantly increased in HBTF only in the gut microbiome but not in the oral microbiome. In addition, the relative abundances of the phylum Proteobacteria, class Gammaproteobacteria, and genus Escherichia-Shigella were significantly low, whereas that of the genus Ruminococcus was significantly high in the gut of children with HBTF, indicating HBTF altered the gut microbial composition and reducing health risks. Metagenome prediction showed enrichment of carbon fixation pathways in prokaryotes at oral and gut microbiomes in children receiving HBTF. In addition, more complex network structures were observed in the oral cavity and gut in the HBTF group than in the RTF group. In conclusion, HBTF not only provides satisfaction and enjoyment during meals with the family but also alters the gut microbial composition to a healthy state
Interactions between species introduce spurious associations in microbiome studies
Microbiota contribute to many dimensions of host phenotype, including
disease. To link specific microbes to specific phenotypes, microbiome-wide
association studies compare microbial abundances between two groups of samples.
Abundance differences, however, reflect not only direct associations with the
phenotype, but also indirect effects due to microbial interactions. We found
that microbial interactions could easily generate a large number of spurious
associations that provide no mechanistic insight. Using techniques from
statistical physics, we developed a method to remove indirect associations and
applied it to the largest dataset on pediatric inflammatory bowel disease. Our
method corrected the inflation of p-values in standard association tests and
showed that only a small subset of associations is directly linked to the
disease. Direct associations had a much higher accuracy in separating cases
from controls and pointed to immunomodulation, butyrate production, and the
brain-gut axis as important factors in the inflammatory bowel disease.Comment: 4 main text figures, 15 supplementary figures (i.e appendix) and 6
supplementary tables. Overall 49 pages including reference
The role of the oral microbiome in the immunobullous diseases pemphigus vulgaris and mucous membrane pemphigoid and oral lichen planus
Saliva is formed from contributions of salivary glands and the serum exudates principally from gingival margins or damaged mucosa combined with components derived from the environment, including a community of microorganisms - the microbiome. I postulate that changes in microbial diversity and population structure play key roles in the modulation of host- microbial interactions which influence both the hypersensitive autoimmune responses and inflammation seen in these inflammatory mucocutaneous disorders. For my research, a total of 186 participants were recruited: 48 mucous membrane pemphigoid (MMP), 48 pemphigus vulgaris (PV), 50 oral lichen planus (OLP) patients, and 40 healthy controls. Unstimulated whole saliva, subgingival plaque, serum, and plasma samples were collected from 186 participants. In addition, metadata were collected on the following covariates: age, gender, ethnicity, type of the diet, disease history and therapeutic intervention in the preceding six months. Oral disease severity scores (ODSS) were assessed, and periodontal status was examined using a periodontal six pocket chart. To characterise microbiome profiles, saliva and subgingival plaque were processed for sequencing genomic DNA using the NGS Shotgun metagenomics sequencing technique. Inflammatory cytokines and proteases were investigated in saliva and serum using Human Magnetic Luminex Screening Assay (R&D Systems). Selected cytokines were analysed by enzyme-linked immunosorbent assay (ELISA) technique (R&D Systems) to determine host inflammatory responses in saliva and serum samples. Additionally, saliva and plasma samples were analysed for metabolites by nuclear magnetic resonance (NMR). Significant increases in periodontal score (PISA) in all three groups of disease were identified compared to healthy control group with significant positive correlation between oral disease severity (ODSS) and PISA in OLP and PV groups. All three groups of diseases had significantly higher levels of inflammatory Th2/Th17 cytokines (IL-6, IL-13 and IL-17 in saliva samples), as well as higher levels of MMP-3 matrixins in saliva. In addition, there were positive correlations between ODSS and salivary IL-6, IL-13 and MMP-3 in saliva of OLP, salivary and serum levels of IL-6 and MMP-3 in MMP group, and significant association of salivary IL-6, IL-1β and MMP-3 in PV group. Metabolomic data showed that saliva is a better biofluid for correlation of the metabolomic profile with oral disease severity than plasma. Salivary ethanol was corelated with disease severity in the OLP group, whereas in PV was a strong correlation of ODSS with choline. Finally, a unique microbial community was found in each group of diseases. In the MMP group, ODSS was significantly correlated with L. hofstadii, C. sputigena, N. meningitidis, N. cinerea and P. sacchar0lytica. In PV, a positive correlation was found with F. nucleatum, G. morbillorum, and E. corrodens, G. elegans, H. sapiens and T. vincentii. In OLP, the disease tends to worsen when there was reduced abundance of X. cellulosilytica, Actinomyces ICM 47, S. parasanguinis, S. salivarius, L. mirabilis and O. sinus. Lower microbial diversity was correlated with ODSS in saliva and plaque of the OLP group. In conclusion, this study provides strong evidence of the complex interplay between the oral microbiome, immunological factors, and metabolites in the context of immunobullous diseases and OLP. The findings highlight the integral role of oral bacteria in disease progression, the significance of immune dysregulation, and the potential impact of specific microbial species and metabolic pathways. These insights give the way for further research and clinical applications, offering the promise of personalized approaches for diagnosis, and management of OLP, MMP and PV. Future investigations should focus on discovering the mechanistic details underlying these associations and validating the identified biomarkers in larger patient cohorts, ultimately contributing to a deeper understanding of the pathogenesis of these conditions
Prevotella diversity, niches and interactions with the human host
The genus Prevotella includes more than 50 characterized species that occur in varied natural habitats, although most Prevotella spp. are associated with humans. In the human microbiome, Prevotella spp. are highly abundant in various body sites, where they are key players in the balance between health and disease. Host factors related to diet, lifestyle and geography are fundamental in affecting the diversity and prevalence of Prevotella species and strains in the human microbiome. These factors, along with the ecological relationship of Prevotella with other members of the microbiome, likely determine the extent of the contribution of Prevotella to human metabolism and health. Here we review the diversity, prevalence and potential connection of Prevotella spp. in the human host, highlighting how genomic methods and analysis have improved and should further help in framing their ecological role. We also provide suggestions for future research to improve understanding of the possible functions of Prevotella spp. and the effects of the Western lifestyle and diet on the host-Prevotella symbiotic relationship in the context of maintaining human health
- …