7 research outputs found

    Bayesian mixture models for metagenomic community profiling

    Get PDF
    Metagenomics can be defined as the study of DNA sequences from environmental or community samples. This is a rapidly progressing field and application ideas that seemed outlandish a few years ago are now routine and familiar. Metagenomics’ scope is broad and includes the analysis of a diverse set of samples such as environmental or clinical samples. Human tissues are in essence metagenomic samples due to the presence of microorganisms, such as bacteria, viruses and fungi in both healthy and diseased individuals. Deep sequencing of clinical samples is now an established tool for pathogen detection, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, particularly for viruses. The research presented in this thesis focuses on using Bayesian Mixture Model techniques to produce taxonomic profiles for metagenomic data. A novel Bayesian mixture model framework for resolving complex metagenomic mixtures is introduced, called metaMix. The use of parallel Monte Carlo Markov chains (MCMC) for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. The improved accuracy of metaMix compared to relevant methods is demonstrated, particularly for profiling complex communities consisting of several related species. metaMix was designed specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection. However, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix

    Comparative genomics for studying the proteomes of mucosal microorganisms

    Get PDF
    A tremendous number of microorganisms are known to interact with their animal hosts. The outcome of the interactions between microbes and their animal hosts range from modulating the maintenance of homeostasis to the establishment of processes leading to pathogenesis. Of the numerous species known to inhabit humans, the great majority live on mucosal surfaces which are highly defended. Despite their importance in human health, little is known about the molecular and cellular basis of most host-microbe interactions across the tremendous diversity of mucosal-adapted microorganisms. The ever-increasing availability of genome sequence data allows systematic comparative genomics studies to identify proteins with potential important molecular functions at the host-microbe interface. In this study, a genome-wide analysis was performed on 3,021,490 protein sequences derived from 867 complete microbial genome sequences across the three domains of cellular life. The ability of microbes to thrive successfully in a mucosal environment was examined in relation to functional genomics data from a range of publicly available databases. Particular emphasis was placed on the extracytoplasmic proteins of microorganisms that thrive on human mucosal surfaces. These proteins form the interface between the complex host-microbe and microbe-microbe interactions. The large amounts of data involved, combined with the numerous analytical techniques that need to be performed makes the study intractable with conventional bioinformatics. The lack of habitat annotations for microorganisms further compounds the problem of identifying the microbial extracytoplasmic proteins playing important roles in the mucosal environments. In order to address these problems, a distributed high throughput computational workflow was developed, and a system for mining biomedical literature was trained to automatically identify microorganisms’ habitats. The workflow integrated existing bioinformatics tools to identify and characterise protein-targeting signals, cell surface-anchoring features, protein domains and protein families. This study successfully demonstrated a large-scale comparative genomics approach utilising a system called Microbase to harness Grid and Cloud computing technologies. A number of conserved protein domains and families that are significantly associated with a speiii iv cific set of mucosa-inhabiting microorganisms were identified. These conserved protein regions of which their functions were either characterised or unknown, were quite narrow in their coverage of taxa distribution, with only a few protein domains more widely distributed, suggesting that mucosal microorganisms evolved different solutions in their strategies and mechanisms for their survival in the host mucosal environments. Metabolic and biological processes common to many mucosal microorganisms included: carbohydrate and amino acid metabolisms, signal transduction, adhesion to host tissues or contents in mucosal environments (e.g. food remnants, mucins), and resistance to host defence mechanisms. Invasive or virulence factors were also identified in pathogenic strains. Several extracytoplasmic protein families were shared among prominent bacterial members of gut microbiota and microbial eukaryotes known to thrive in the same environment, suggesting that the ability of microbes to adapt to particular niches can be influenced by lateral gene transfer. A large number of conserved regions or protein families that potentially play important roles in the mucosa-microbe interactions were revealed by this study. Several of these candidates were proteins of unknown function. The identified candidates were subjected to more detailed computational analysis providing hypothesis for their function that will be tested experimentally in order to contribute to our understanding of the complex host-microbe interactions. Among the candidates of unknown function, a novel M60-like domain was identified. The domain was deposited in the Pfam database with accession number PF13402. The M60-like domain is shared amongst a broad range of mucosal microorganisms as well as their vertebrate hosts. Bioinformatics analyses of the M60-like domain suggested a potential catalytic function of the conserved motif as gluzincins metalloproteases. Targeting signals were detected across microbial M60-likecontaining proteins. Mucosa-related carbohydrate-binding modules (CBMs), CBM32 was also identified on several proteins containing M60-like domains encoded by known mucosal commensals and pathogens. The co-occurrence of the CBMs and M60-like domain, as well as annotated potential peptidase function unveiled a new functional context for the CBM, which is typically connected with carbohydrate processing enzymes but not proteases. The CBM domains linked with members of different protease families are likely to enable these proteases to bind to specific glycoproteins from host animals further highlighting the importance of proteases and CBMs (CBM32 and CBM5_12) in host-microbe interactions.EThOS - Electronic Theses Online ServiceMedical School, Newcastle UniversityGBUnited Kingdo

    Investigating the genetic basis of preservative resistance in an industrial Pseudomonas aeruginosa strain

    Get PDF
    Pseudomonas aeruginosa is a common industrial contaminant associated with costly recalls of home and personal care(HPC) products. Preservation systems are used to prevent bacterial contamination and protect consumers, but little is known about the mechanisms of preservative resistance in P. aeruginosa. The aim of this research was to map genetic and metabolic pathways associated with preservative resistance and bacterial growth in HPC products. The genome of the industrial strain P. aeruginosa RW109 was sequenced, functionally annotated, and compared to other strains of the species. This revealed the first complete genome of a P. aeruginosa isolate from the HPC industry. Comparative analysis with 102 P. aeruginosa strains from various sources, showed industrial strains’ genomes to be significantly larger than clinical and environmental strains and RW109’s genome was the largest of the species (7.8 Mbp) and included two plasmids. Identification of differentially expressed genes by RNA-Seq (more informative than mini-Tn5-luxCDABE mutagenesis), revealed complex genetic networks utilised by RW109 when exposed to benzisothiazolone(BIT), phenoxyethanol (POE) and a laundry detergent formulation. Differential expression of five sets of genes was consistently observed in response to these industry relevant conditions - MexPQ-OpmE efflux pump, sialic acid transporter and isoprenoid biosynthesis (gnyRDBHAL) genes were frequently upregulated; whereas phnBA and pqsEDCBA genes encoding PQS production and quorum-sensing, respectively, were consistently down-regulated. Genome-scale metabolic network reconstruction of RW109, the first with a P. aeruginosa industrial strain, along with integration of transcriptomic data, predicted essential pathways for RW109’s preservative resistance (e.g. cell membrane phospholipid biosynthesis as a key pathway for POE resistance). This study highlights the utility of integrating genomic, transcriptomic and metabolic modelling approaches to uncover the basis of industrial bacterial resistance to preservative and product formulations. The ability to predict the metabolic basis of P. aeruginosa preservative resistance will inform the development of targeted industrial preservation systems, enhancing product safety and minimising future resistance development

    Early-rearing Environment and Mate Choice in Chinook Salmon (Oncorhynchus tshawytscha) Aquaculture: Effects on the Immune System

    Get PDF
    Canada is the fourth largest producer of farmed salmon in the world, with Atlantic salmon being the major species cultivated. Paradoxically British Columbia (BC), which borders the Pacific Ocean, is the major producer province where Atlantic salmon was introduced in the mid-80’s. Escaped salmon may constitute a threat to natural populations of Pacific salmon as they compete for the same resources such as food and spawning territory. A potential solution to the aquaculture industry would be to further develop the aquaculture of native species in the region. The work presented here used semi-natural spawning channels to evaluate the effects of breeding strategies and early-rearing environments on the immune performance of Chinook salmon. Breeding strategy was tested analyzing artificial hatchery practices versus semi-natural propagation in spawning channels. Early-rearing environmental assessment contrasted indoor plastic hatchery tanks with outdoor gravelled-bottom spawning channels. A disease challenge involving over 1400 fish showed interaction effects between breeding strategy and rearing environment. Fish artificially mated presented a disease susceptibility influenced by the rearing environment. The contrary occurred in the offspring of self-breeding brood stock in the spawning channels, as no differences were observed in their susceptibility to the disease regardless of rearing environment. Monitoring of anti-Vibrio anguillarum antibodies during the disease challenge and a follow up of the survivors in sea net pens further confirmed the interaction between breeding strategy and rearing environment. Gene expression in pre- and post-infected artificially propagated fish showed differential gene expression when analyzed with a 695-gene cDNA microarray for Chinook salmon. Genotyping of major histocompatibility (MH) class II β1 alleles showed a tendency of a higher heterozygosity in survivors as expected, as well as a general tendency of a higher heterozygosity in semi-naturally propagated fish. The latter is likely a direct consequence of MH-linked mate choice, which was recently described in Chinook salmon (Neff et al., 2008). To further characterize the mating system of Chinook salmon in the spawning channels, brood stock were genotyped at 12 microsatellite loci. Females and males were found to mate randomly with regards to genetic pairwise relatedness, but they tended to mate with fish of similar condition as revealed by their pairwise differences in Fulton’s condition factor. This work demonstrated that genotype-by-environment interactions can modify the disease resistance of Chinook salmon. More importantly, these effects were seen after just one round of semi-natural spawning of domesticated hatchery fish, suggesting that further studies on spawning channels may highlight other hidden benefits. Therefore, breeding strategy and early-rearing environment should be considered when propagating cultured stocks. The use of more natural propagation methods such as spawning channels could improve the immune performance of Chinook salmon and help to expand the aquaculture of this native species in BC