368 research outputs found

    Metagenomic And Transcriptomic Insights Into The Skin Microbiome And Host Response

    Get PDF
    Culture-independent, high-throughput sequencing techniques have enabled us to characterize the complex communities of resident microorgansims living in and on our skin. These microbes contribute to cutaneous health, by providing colonization resistance and regulating immunity, however disruption of skin microbiota has been linked to diseases, like acne and atopic dermatitis. While the microbiome is a promising therapeutic target, we do not fully understand the mechanisms underlying the microbial contributions to disease pathogenesis. It is therefore crucial to develop comprehensive knowledge of both the structure of the skin microbiome and the host functions it stimulates. Here, we present a critical analysis of the composition and metabolic potential of healthy human skin microbiota and how these communities modulate the gene expression of their hosts. In the first section, we optimize methodologies for skin microbiome studies and utilize the best approaches to characterize the bacteria, viruses, and fungi colonizing healthy skin. We applied amplification and sequencing of two hyper variable regions of the 16S rRNA gene, in addition to whole metagenomic shotgun sequencing, to cutaneous swab samples from a healthy cohort. We demonstrate that shotgun sequencing yields the most accurate profiles of the skin microbiome, but that sequencing of the V1-V3 region of the 16S rRNA gene is a suitable, cost-effective alternative. We also reveal significant taxonomic, functional, and temporal diversity of skin microbial communities and highlight potential multi-kingdom interactions between skin phages and their bacterial hosts. The second section focuses on the response of the cutaneous transcriptome to colonization by resident microflora. We used RNA sequencing to compare gene expression in skin collected from sterile, germ-free mice, and mice conventionally raised in the presence of microbiota. We find that the skin microbiome primes the cutaneous immune system, through increases in both frequency of innate immune cell populations and expression of immune response genes. We also reveal that the skin microbiome transcriptionally regulates epidermal development and differentiation, suggesting a novel role for microorganisms in skin barrier structure and function. Together, the work presented in this thesis highlights the complex dynamics of skin microbial communities and their impact on the host

    Computational Metagenomics: Network, Classification and Assembly

    Get PDF
    Due to the rapid advance of DNA sequencing technologies in recent 10 years, large amounts of short DNA reads can be obtained quickly and cheaply. For example, a single Illumina HiSeq machine can produce several terabytes of data sets within a week. Metagenomics is a new scientific field that involves the analysis of genomic DNA sequences obtained directly from the environment, enabling studies of novel microbial systems. Metagenomics was made possible from high-throughput sequencing technologies. The analysis of the resulting data requires sophisticated computational analyses and data mining. In clinical settings, a fundamental goal of metagenomics is to help people diagnose and cure disease in clinical settings. One major bottleneck so far is how to analyze the huge noisy data sets quickly and precisely. My PhD research focuses on developing algorithms and tools to tackle these challenging and interesting computational problems. From the functional perspective, a metagenomic sample can be represented as a weighted metabolic network, in which the nodes are molecules, edges are enzymes encoded by genes, and the weights can be considered as the number of organisms providing the functions. One goal of functional comparison between metagenomic samples is to find differentially abundant metabolic subnetworks between two groups under comparison. We have developed a statistical network analysis tool - MetaPath, which uses a greedy search algorithm to find maximum weight subnetwork and a nonparametric permutation test to measure the statistical significance. Unlike previous approaches, MetaPath explicitly searches for significant subnetwork in the global network, enabling us to detect signatures at a finer level. In addition, we developed statistical methods that take into account the topology of the network when testing the significance of the subnetworks. Another computational problem involves classifying anonymous DNA sequences obtained from metagenomic samples. There are several challenges here: (1) The classification labels follow a hierarchical tree structure, in which the leaves are most specific, and the internal nodes are more general. How can we classify novel sequences that do not belong to leaf categories (species) but belong to internal groups (e.g., phylum)? (2) For each classification how can we compute a confidence score, such that the users have a tradeoff between sensitivity and specificity? (3) How can we analyze billions of data items quickly? We have developed a novel hierarchical classifier (MetaPhyler) for the classification of anonymous DNA reads. Through simulation, MetaPhyler models the distribution of pairwise similarities within different hierarchical groups with nonparametric density estimation. The confidence score is computed by the ratio of likelihood function. For a query DNA sequence with arbitrary length, its similarity can be calculated through linear approximation. Through benchmark comparison, we have shown that MetaPhyler is significantly faster and more accurate than previous tools. DNA sequencing machines can only produce very short strings (e.g., 100bp) relative to the size of a genome (e.g., a typical bacterial genome is 5Mbp). One of the most challenging computational tasks is the assembly of millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. In this project, we have developed a comparative metagenomic assembler (MetaCompass), which utilizes the genomes that have already been sequenced previously, and produces long contigs through read mapping (alignment) and assembly. Given the availability of thousands of existing bacteria genomes, for a particular sample, MetaCompass first chooses a best subset as reference based on the taxonomic composition. Then, the reads are aligned against these genomes using MUMmer-map or Bowtie2. Afterwards, we use a greedy algorithm of the minimum set-covering problem to build long contigs, and the consensus sequences are computed by the majority rule. We also propose an iterative approach to improve the performance. Finally, MetaCompass has been successfully evaluated and tested on over 20 terabytes of metagenomic data sets generated from the Human Microbiome Project. In addition, to facilitate the identification and characterization of antibiotic resistance genes, we have created Antibiotic Resistance Genes Database (ARDB), which provides a centralized compendium of information on antibiotic resistance. Furthermore, we have applied our tools to the analysis of a novel oral microbiome data set, and have discovered interesting functional mechanisms and ecological changes underlying the transition from health to periodontal disease of human mouth at a system level

    From data to science: a multi-Omics analysis of the pathobiome

    Get PDF
    Humans represent a complex ecosystem colonized not only by our cells but trillions of other microbes such as bacteria, archaea, fungi, and viruses. This microbiome gains increasing interest due to its involvement in human health and disease. While we live in symbiosis with most of these travellers, dysbiosis can lead to the growth of pathogens. Pathobionts are commensal microbes and harmless in healthy individuals until specific circumstances occur. There is increasing interest in studying this pathobiome due to the rise in infections with high mortality rates and stagnant treatment options. Due to the complexity of possible interactions between the host and microbes, studies on microbial interactions are conducted at varying scales. In this thesis, we start to study interactions in small, well-controlled model systems in vitro and then at the community level in vivo. The key technology used to identify, quantify, and characterize microbes and study host- microbe interactions throughout my studies is whole-genome and transcriptome sequencing. While an extensive body of work has focused on understanding the virulence factors of common pathogens, such as Aspergillus and Candida species, very little work has been done on understanding the interplay of those pathogens with the host’s symbionts or other pathogens at the start of my Ph.D. In my Ph.D. project, I used next- generation sequencing, advanced statistical approaches, and machine learning to significantly expanded our knowledge of the life of pathogens from an ecological point of view

    Exploring functional annotation through genomic and metagenomic data mining

    Get PDF
    Functional profiling of genomes and metagenomes, as well as data mining for novel proteins, all rely on computational methods for functional annotation of protein sequences. Standard methods assign protein function based on detected homology to reference sequences, but often leave behind a significant fraction of hypothetical sequences ("dark matter") that cannot be annotated. To maximize our ability to extract new biological insights from newly sequenced genomes, it is critical to understand the advantages and limitations of homology-based annotation, and explore alternative methods for inferring function. In this thesis, I performed a comprehensive exploration of computational protein annotation, with a focus on bacterial genomes and metagenomes. First, I applied homology-based methods to functionally annotate and analyze original datasets including newly sequenced Streptomyces strains, a wastewater metagenome, and microbial communities involved in vertebrate decomposition. These studies identified genes and functions of interest including cellulases, antibiotic resistance genes, and virulence factors. I then explored the limits of homology-based annotation by measuring annotation coverage, the fraction of annotated proteins in a proteome, across ~27,000 organisms in the microbial tree of life. This study demonstrated a wide range in annotation coverage across bacteria, from 2-86%. In addition, it revealed multiple factors including taxonomy, genome size, and research bias, as heavy influences on the degree to which proteomes could be annotated. To gain biological insights into hypothetical proteins of unknown function, I analyzed 4,049 domains of unknown function (DUFs) from Pfam. Using phylogenomic, taxonomic and metagenomic information, I detected statistical associations between domains and biological traits. Association-based methods uncovered environment, lineage, and/or pathogen associations in just under half of all DUFs and highlighted new families such as DUF4765 as intriguing virulence factor candidates. Finally, I constructed a database of "ORFan" metagenomic sequences that cannot be annotated using standard approaches, and inferred functions for tens of thousands of these sequences using profile-profile comparison approaches. Motif analysis and genomic context validated these predictions, enabling the discovery of hundreds of novel candidate metalloproteases. Protein "dark matter", which includes a large pool of unannotated coding sequences, is an incredible resource to find new proteins and functions of interest, and included are suggestions on how to prioritize these sequences for future study. A combination of homology-based and alternative annotation methods will be most effective for broad functional profiling of genomes and metagenomes, and can push the boundaries for functional interpretation of sequence data

    From Hydra to Humans: Insights into molecular mechanisms of aging and longevity

    Get PDF
    Human aging is characterized by progressive functional decline that coincides with both increased morbidity and mortality. Aging affects every human being and only few individuals achieve longevity, a very special phenotype marked by extraordinary healthy aging. This thesis consists of three chapters; each one is devoted to a separate project that contributes to the growing body of knowledge about aging and longevity. The work required the compilation, management and analysis of diverse big data sets and the application of cutting-edge statistical and computational methods. Chapter 1 - A functional genomics study was conducted in the potentially immortal freshwater polyp Hydra using body part-specific microarray and RNA sequencing data. The results revealed gene expression patterns that allow boundary maintenance during Hydra’s continuous cell proliferation and tissue self-renewal. Furthermore, this study provided evidence for de-acetylation as a key mechanism underlying compartmentalization. Surprisingly, FoxO, which is known to substantially drive developmental processes and stem cell renewal in Hydra, did not seem to be affected by the acetylation status. Chapter 2 - Long-lived individuals (LLI, >95 years of age) epitomize the healthy aging phenotype and are thought to carry beneficial genetic variants that predispose to human longevity. Despite extensive research efforts, only few of these genetic factors in LLI have been identified so far. In contrast to previous investigations which mainly focused on intronic variants, a genome-wide exome-based case-control study was performed. DNA samples of more than 1,200 German LLI, including 599 centenarians (≥100 years), and about 6,900 younger controls were used for single-variant and gene-based association analyses that yielded two new candidate longevity genes, fructosamine 3 kinase related protein (FN3KRP) and phosphoglycolate phosphatase (PGP). FN3KRP functions in the deglycation of proteins to restore their function, while PGP via controlling glycerol-3-phosphate levels affects both glucose and fat metabolism. Given the biological functions of the genes, their longevity-associations appear very plausible. Chapter 3 - In recent years, the intestinal microbiome (GM) has increasingly gained attention in aging and longevity research. A 16S rRNA microbiome study was conducted using 1301 stool samples of healthy individuals (age range: 19 - 104 years) that were drawn from three cohorts. The aim was to investigate potential associations among GM composition, host genetics and environmental factors during aging. The GM composition changed with age, showing an increase of opportunistic pathogens that may generate an inflammatory environment in the gut. Age explained only ~1% of the inter-individual variation, whereas anthropometric measures, genetic background and dietary patterns together explained 20%. Strikingly, clear GM population stratification in terms of four enterotype-like clusters was observed, which were predominantly associated with dietary patterns. The correction for these clusters was shown to increase the comparability of findings from the different cohorts. In addition, the LLI showed a specific gut microbial pattern, which is in line with previously published reports. The present work shows that a thorough bioinformatics expertise helps to address the complexity of the two phenotypes aging and longevity. One highlight of the thesis is the discovery of two new candidate longevity loci that, in view of the limited output of previous study approaches, enlarge the existing database
    • …
    corecore