11 research outputs found

    A Bayesian Rule Generation Framework for 'Omic' Biomedical Data Analysis

    Get PDF
    High-dimensional biomedical 'omic' datasets are accumulating rapidly from studies aimed at early detection and better management of human disease. These datasets pose tremendous challenges for analysis due to their large number of variables that represent measurements of biochemical molecules, such as proteins and mRNA, from bodily fluids or tissues extracted from a rather small cohort of samples. Machine learning methods have been applied to modeling these datasets including rule learning methods, which have been successful in generating models that are easily interpretable by the scientists. Rule learning methods have typically relied on a frequentist measure of certainty within IF-THEN (propositional) rules. In this dissertation, a Bayesian Rule Generation Framework (BRGF) is developed and tested that can produce rules with probabilities, thereby enabling a mathematically rigorous representation of uncertainty in rule models. The BRGF includes a novel Bayesian Discretization method combined with one or more search strategies for building constrained Bayesian Networks from data and converting them into probabilistic rules. Both global and local structures are built using different Bayesian Network generation algorithms and the rule models generated from the network are tested on public and private 'omic' datasets. We show that using a specific type of structure (Bayesian decision graphs) in tandem with a specific type of search method (parallel greedy) allows us to achieve statistically significant higher overall performance over current state of the art rule learning methods. Not only does using the BRGF boost performance on average on 'omic' biomedical data to a statistically significant point, but also provides the ability to incorporate prior information in a mathematically rigorous fashion for modeling purposes

    Analysis of Human Gut Metagenomes for the Prediction of Host Traits with Tree Ensemble Machine Learning Models

    Get PDF
    The human gut microbiota is made of a myriad of microorganisms, among which not only bacteria but also archaea. Present at lower abundances, technically more challenging to quantify, and under-represented in databases, archaea are often overseen when describing the human gut microbiome. Nonetheless, the main archaeon in terms of prevalence and abundance is Methanobrevibacter smithii, family Methanobacteriaceae. It has been associated with various host phenotypes such as slow transit or diet habits. Remarkably, contrasting evidence shows an association between M. smithii and body mass index (BMI): it is enriched in lean or obese individuals according to population studies. Reasonable hypotheses relying on the metabolism of the archaeon support these conflicting findings. For instance, its slow replication time supports its association with slow transit. M. smithii and all members of the Methanobacteriaceae family are methanogens: their metabolism relies on the reduction of simple carbon molecules to methane. In the human gut, methanogenesis starts from bacterial fermentation products. In particular, H2 and CO2 are the primary substrates of M. smithii, formate can also be used but with a lower energy yield. By uptaking fermentation products, M. smithii can boost specific fermentation pathways, consequently affecting the production of short-chain fatty acids (SCFA). These byproducts of bacterial fermentation are absorbed by the host, where they mediate host energy and inflammatory metabolisms. Accordingly, its overall effect may de- pend on the fermentation potential of the gut microbiome, itself defined by the microbiome composition. Hence, M. smithii may influence its host by consuming fermentation products. Because we know so little about the interactions between M. smithii and fermenting bacteria, gaining knowledge on their diversity and specificity and the underlying mechanisms would improve our understanding of methanogens’ role in the human gut. This work aims at providing insights into the associations between M. smithii and gut bacteria. Due to the fastidiousness of methanogens’ culture, I performed a meta-analysis of human gut metagenomes using machine learning models. To decipher the variable interactions captured by the model, I developed a tool for interpreting tree ensemble models. My new method allowed me to infer biologically relevant associations between the methanogen and components of the human gut environment. In particular, I found a clear association between M. smithii and an uncultured family of the Christensenellales order, as well as members of the Oscillospirales order predicted to have a slow replication time and be associated with slow transit. Furthermore, predictions from the model revealed a gradient in relative abundances of a core group of taxa associated with the colonization of human guts by Methanobacteriaceae. This gradient generally followed microbiome composition types, i.e., enterotypes, previously correlated with human population traits. This suggests that associations between methanogens and phenotypes known to be associated with certain enterotypes, such as BMI is correlated with the ETB enterotype, may be spurious. Then, I further explored the association between M. smithii and members of the Christensenellales order. For this, I compared co-cultures of M. smithii with Christensenella minuta, a human gut iso- late of the Christensenellaceae family, and Bacteroides thetaiotaomicron, a common H2-producer from the human gut. Results demonstrated a syntrophy via H2-transfer between Christensenellaceae and the methanogen, accompanied by a switch in SCFA production. Altogether, my findings complement the current knowledge on interactions between the human gut methanogen M. smithii and fermenting bacteria. They support the hypothesis that M. smithii preferentially interacts with specific H2-producers in the human gut, e.g., members of the Christensenellales order, as well as a core group of bacteria favoring its colonization of the gut environment. Syntrophy may underlie the identified associations, with potential effects on bacterial fermentation. In addition, my method for interpreting machine learning models applies to all sorts of problems being studied with tree ensemble models. Thus, its potential in helping understand complex systems is not limited to the microbiome field and will hopefully appear useful to other researchers in the future

    Analytical Techniques for the Improvement of Mass Spectrometry Protein Profiling

    Get PDF
    Bioinformatics is rapidly advancing through the "post-genomic" era following the sequencing of the human genome. In preparation for studying the inner workings behind genes, proteins and even smaller biological elements, several subdivisions of bioinformatics have developed. The subdivision of proteomics, concerning the structure and function of proteins, has been aided by the mass spectrometry data source. Biofluid or tissue samples are rapidly assayed for their protein composition. The resulting mass spectra are analyzed using machine learning techniques to discover reliable patterns which discriminate samples from two populations, for example, healthy or diseased, or treatment responders versus non-responders. However, this data source is imperfect and faces several challenges: unwanted variability arising from the data collection process, obtaining a robust discriminative model that generalizes well to future data, and validating a predictive pattern statistically and biologically.This thesis presents several techniques which attempt to intelligently deal with the problems facing each stage of the analytical process. First, an automatic preprocessing method selection system is demonstrated. This system learns from data and selects a combination of preprocessing methods which is most appropriate for the task at hand. This reduces the noise affecting potential predictive patterns. Our results suggest that this method can help adapt to data from different technologies, improving downstream predictive performance. Next, the issues of feature selection and predictive modeling are revisited with respect to the unique challenges posed by proteomic profile data. Approaches to model selection through kernel learning are also investigated. Key insights are obtained for designing the feature selection and predictive modeling portion of the analytical framework. Finally, methods for interpreting the resultsof predictive modeling are demonstrated. These methods are used to assure the user of various desirable properties: validation of the strength of a predictive model, validation of reproducible signal across multiple data generation sessions and generalizability of predictive models to future data. A method for labeling profile features with biological identities is also presented, which aids in the interpretation of the data. Overall, these novel techniques give the protein profiling community additional support and leverage to aid the predictive capability of the technology

    Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

    Get PDF
    Background: Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers. High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties. As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers. Results: A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of "grouping variables that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS is a feature selection method, while recursive partitioning is a learning tree algorithm that has been used for feature selection in the past. Conclusions: The proposed feature selection method performs well for both glycan chromatography datasets. It is computationally slower, but results in a lower misclassification rate and a higher sensitivity rate than both correlation-based feature selection and the classification tree method.25 page(s

    Role of adipose tissue in the pathogenesis and treatment of metabolic syndrome

    Get PDF
    © Springer International Publishing Switzerland 2014. Adipocytes are highly specialized cells that play a major role in energy homeostasis in vertebrate organisms. Excess adipocyte size or number is a hallmark of obesity, which is currently a global epidemic. Obesity is not only the primary disease of fat cells, but also a major risk factor for the development of Type 2 diabetes, cardiovascular disease, hypertension, and metabolic syndrome (MetS). Today, adipocytes and adipose tissue are no longer considered passive participants in metabolic pathways. In addition to storing lipid, adipocytes are highly insulin sensitive cells that have important endocrine functions. Altering any one of these functions of fat cells can result in a metabolic disease state and dysregulation of adipose tissue can profoundly contribute to MetS. For example, adiponectin is a fat specific hormone that has cardio-protective and anti-diabetic properties. Inhibition of adiponectin expression and secretion are associated with several risk factors for MetS. For this purpose, and several other reasons documented in this chapter, we propose that adipose tissue should be considered as a viable target for a variety of treatment approaches to combat MetS

    Identifying Relevant Evidence for Systematic Reviews and Review Updates

    Get PDF
    Systematic reviews identify, assess and synthesise the evidence available to answer complex research questions. They are essential in healthcare, where the volume of evidence in scientific research publications is vast and cannot feasibly be identified or analysed by individual clinicians or decision makers. However, the process of creating a systematic review is time consuming and expensive. The pace of scientific publication in medicine and related fields also means that evidence bases are continually changing and review conclusions can quickly become out of date. Therefore, developing methods to support the creating and updating of reviews is essential to reduce the workload required and thereby ensure that reviews remain up to date. This research aims to support systematic reviews, thus improving healthcare through natural language processing and information retrieval techniques. More specifically, this thesis aims to support the process of identifying relevant evidence for systematic reviews and review updates to reduce the workload required from researchers. This research proposes methods to improve studies ranking for systematic reviews. In addition, this thesis describes a dataset of systematic review updates in the field of medicine created using 25 Cochrane reviews. Moreover, this thesis develops an algorithm to automatically refine the Boolean query to improve the identification of relevant studies for review updates. The research demonstrates that automating the process of identifying relevant evidence can reduce the workload of conducting and updating systematic reviews

    Ultrasensitive detection of toxocara canis excretory-secretory antigens by a nanobody electrochemical magnetosensor assay.

    Full text link
    peer reviewedHuman Toxocariasis (HT) is a zoonotic disease caused by the migration of the larval stage of the roundworm Toxocara canis in the human host. Despite of being the most cosmopolitan helminthiasis worldwide, its diagnosis is elusive. Currently, the detection of specific immunoglobulins IgG against the Toxocara Excretory-Secretory Antigens (TES), combined with clinical and epidemiological criteria is the only strategy to diagnose HT. Cross-reactivity with other parasites and the inability to distinguish between past and active infections are the main limitations of this approach. Here, we present a sensitive and specific novel strategy to detect and quantify TES, aiming to identify active cases of HT. High specificity is achieved by making use of nanobodies (Nbs), recombinant single variable domain antibodies obtained from camelids, that due to their small molecular size (15kDa) can recognize hidden epitopes not accessible to conventional antibodies. High sensitivity is attained by the design of an electrochemical magnetosensor with an amperometric readout with all components of the assay mixed in one single step. Through this strategy, 10-fold higher sensitivity than a conventional sandwich ELISA was achieved. The assay reached a limit of detection of 2 and15 pg/ml in PBST20 0.05% or serum, spiked with TES, respectively. These limits of detection are sufficient to detect clinically relevant toxocaral infections. Furthermore, our nanobodies showed no cross-reactivity with antigens from Ascaris lumbricoides or Ascaris suum. This is to our knowledge, the most sensitive method to detect and quantify TES so far, and has great potential to significantly improve diagnosis of HT. Moreover, the characteristics of our electrochemical assay are promising for the development of point of care diagnostic systems using nanobodies as a versatile and innovative alternative to antibodies. The next step will be the validation of the assay in clinical and epidemiological contexts
    corecore