6 research outputs found

    Human-microbiota interactions in health and disease :bioinformatics analyses of gut microbiome datasets

    Get PDF
    EngD ThesisThe human gut harbours a vast diversity of microbial cells, collectively known as the gut microbiota, that are crucial for human health and dysfunctional in many of the most prevalent chronic diseases. Until recently culture dependent methods limited our ability to study the microbiota in depth including the collective genomes of the microbiota, the microbiome. Advances in culture independent metagenomic sequencing technologies have since provided new insights into the microbiome and lead to a rapid expansion of data rich resources for microbiome research. These high throughput sequencing methods and large datasets provide new opportunities for research with an emphasis on bioinformatics analyses and a novel field for drug discovery through data mining. In this thesis I explore a range of metagenomics analyses to extract insights from metagenomics data and inform drug discovery in the microbiota. Firstly I survey the existing technologies and data sources available for data mining therapeutic targets. Then I analyse 16S metagenomics data combined with metabolite data from mice to investigate the treatment model of a proposed antibiotic treatment targetting the microbiota. Then I investigate the occurence frequency and diversity of proteases in metagenomics data in order to inform understanding of host-microbiota-diet interactions through protein and peptide associated glycan degradation by the gut microbiota. Finally I develop a system to facilitate the process of integrating metagenomics data for gene annotations. One of the main challenges in leveraging the scale of data availability in microbiome research is managing the data resources from microbiome studies. Through a series of analytical studies I used metagenomics data to identify community trends, to demonstrate therapeutic interventions and to do a wide scale screen for proteases that are central to human-microbiota interactions. These studies articulated the requirement for a computational framework to integrate and access metagenomics data in a reproducible way using a scalable data store. The thesis concludes explaining how data integration in microbiome research is needed to provide the insights into metagenomics data that are required for drug discovery

    Integrating distributed post-genomic data to infer the molecular basis of bacterial phenotypes

    Get PDF
    The aim of the project described in this thesis is to understand and predict the characteristics and behaviour of a family of bacteria through an analysis of genome wide data from a variety of sources. The focus of the research is a family of bacteria, Bacillus, whose members show a diverse range of phenotypes, from the non-pathogenic B. subtilis to B. anthrncis, the causative agent of anthrax. Specifically, the focus was on the genomic scale identification and characterisation of secreted proteins from Bacillus species. Firstly, the application of Grid-based computational approaches to problems in genomic analysis and annotation was investigated, applying mllGrid technology to a biological problem not previously addressed using this approach. e-Science workflows and a service-oriented approach were developed and applied to predict and characterise secreted proteins, and the results automatically integrated into a custom relational database. An associated Web portal was also developed to facilitate expert curation, results browsing and querying over the database. Workflow technology was also used to classify the putative secreted proteins into families and to study the relationships between and within these families. The design of the workflows, the architecture and the reasoning behind the approach used to build this system, called BaSPP, are discussed. Analysis of the putative Bacillus secretomes revealed clear distinctions between proteins present in the pathogens and those in the non-pathogens. The properties of the protein families present in all Bacillus secretomes, as well as those specific either to the pathogens or to the non-pathogens were investigated. Many of the protein families contained members of unknown function. In the iv second part of the project, these families were investigated in more depth, using additional data integration strategies not previously applied to these organisms. The secretomes were modelled at the system level, in the broader context of interactomes. A system called SubtilNet was therefore developed, using B. subtilis as the reference organism. As part of SubtilNet, a toolkit and architecture were developed and implemented for building and analysing probabilistic functional integrated networks (PFINs). The PFINs built for each Bacillus species using this system were subsequently used to delve further into the interactions specific to the secreted proteins by extracting and exploring the cross-species PFINs of these proteins. The cross-species PFINs for the protein families specific to the pathogens and non-pathogens were explored, with particular emphasis on the core PrsA-like protein family, which acted as a use case to show how the PFIN s can be used to shed light on protein function. The addition of orthologous links between species was demonstrated to facilitate network clustering and analysis, enabling putative annotations to be applied to proteins previously of unknown function.EThOS - Electronic Theses Online ServiceNorth East Regional e-Science Centre : European Commission (LSHC-CT-2004-503468) : EPSRC : Non-Linear DynamicsGBUnited Kingdo

    A Grid-based System for Microbial Genome Comparison and Analysis

    Get PDF

    A grid-based system for microbial genome comparison and analysis

    No full text
    Genome comparison and analysis can reveal the structures and functions of genome sequences of different species. As more genomes are sequenced, genomic data sources are rapidly increasing such that their analysis is beyond the processing capabilities of most research institutes. The Grid is a powerful solution to support large-scale genomic data processing and genome analysis. This paper presents the Microbase project that is developing a Grid-based system for genome comparison and analysis, and discusses the first implementation of the system (called MicrobaseLite). MicrobaseLite uses a scalable computing environment to support computationally intensive microbial genome comparison and analysis, employing state-of-the-art technologies of Web Services, notification, comparative genomics and parallel computing. Microbase will support not only system-defined genome comparison and analysis but also user-defined, remotely conceived genome analysis. \ua9 2005 IEEE

    JT: A Grid-based System for Microbial Genome Comparison and Analysis

    No full text
    Genome comparison and analysis can reveal the structures and functions of genome sequences of different species. As more genomes are sequenced, genomic data sources are rapidly increasing such that their analysis is beyond the processing capabilities of most research institutes. The Grid is a powerful solution to support large-scale genomic data processing and genome analysis. This paper presents the Microbase project that is developing a Grid-based system for genome comparison and analysis, and discusses the first implementation of the system (called MicrobaseLite). MicrobaseLite uses a scalable computing environment to support computationally intensive microbial genome comparison and analysis, employing state-of-the-art technologies of Web Services, notification, comparative genomics and parallel computing. Microbase will support not only system-defined genome comparison and analysis but also user-defined, remotely conceived genome analysis. 1

    Newcastle upon Tyne, NE1 7RU, UK. A Grid-based System for Microbial Genome Comparison and Analysis

    No full text
    Genome comparison and analysis can reveal the structures and functions of genome sequences of different species. As more genomes are sequenced, genomic data sources are increasing in size and availability such that their analysis is beyond the processing capabilities of most research institutes. The Grid is a powerful solution to support large-scale genomic data processing and genome analysis. This paper presents the Microbase project that is developing a Grid-based system for genome comparison and analysis, and discusses the first implementation of the system (called MicrobaseLite). MicrobaseLite uses a scalable computing environment to support computationally intensive microbial genome comparison and analysis, employing state-of-the-art technologies of Web Services, notification, comparative genomics and parallel computing. Microbase will support not only system-defined genome comparison and analysis but also userdefined, remotely conceived genome analysis. 1
    corecore