158 research outputs found

    Machine learning approach to reconstructing signalling pathways and interaction networks in biology

    Get PDF
    In this doctoral thesis, I present my research into applying machine learning techniques for reconstructing species interaction networks in ecology, reconstructing molecular signalling pathways and gene regulatory networks in systems biology, and inferring parameters in ordinary differential equation (ODE) models of signalling pathways. Together, the methods I have developed for these applications demonstrate the usefulness of machine learning for reconstructing networks and inferring network parameters from data. The thesis consists of three parts. The first part is a detailed comparison of applying static Bayesian networks, relevance vector machines, and linear regression with L1 regularisation (LASSO) to the problem of reconstructing species interaction networks from species absence/presence data in ecology (Faisal et al., 2010). I describe how I generated data from a stochastic population model to test the different methods and how the simulation study led us to introduce spatial autocorrelation as an important covariate. I also show how we used the results of the simulation study to apply the methods to presence/absence data of bird species from the European Bird Atlas. The second part of the thesis describes a time-varying, non-homogeneous dynamic Bayesian network model for reconstructing signalling pathways and gene regulatory networks, based on L`ebre et al. (2010). I show how my work has extended this model to incorporate different types of hierarchical Bayesian information sharing priors and different coupling strategies among nodes in the network. The introduction of these priors reduces the inference uncertainty by putting a penalty on the number of structure changes among network segments separated by inferred changepoints (Dondelinger et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic and real data, I demonstrate that using information sharing priors leads to a better reconstruction accuracy of the underlying gene regulatory networks, and I compare the different priors and coupling strategies. I show the results of applying the model to gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae. In each case, the underlying network is time-varying; for Drosophila melanogaster, as a consequence of measuring gene expression during different developmental stages; for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian clock genes under different conditions; and for the synthetic biology dataset, as a consequence of changing the growth environment. I show that in addition to inferring sensible network structures, the model also successfully predicts the locations of changepoints. The third and final part of this thesis is concerned with parameter inference in ODE models of biological systems. This problem is of interest to systems biology researchers, as kinetic reaction parameters can often not be measured, or can only be estimated imprecisely from experimental data. Due to the cost of numerically solving the ODE system after each parameter adaptation, this is a computationally challenging problem. Gradient matching techniques circumvent this problem by directly fitting the derivatives of the ODE to the slope of an interpolant. I present an inference procedure for a model using nonparametric Bayesian statistics with Gaussian processes, based on Calderhead et al. (2008). I show that the new inference procedure improves on the original formulation in Calderhead et al. (2008) and I present the result of applying it to ODE models of predator-prey interactions, a circadian clock gene, a signal transduction pathway, and the JAK/STAT pathway

    Bayesian analytical approaches for metabolomics : a novel method for molecular structure-informed metabolite interaction modeling, a novel diagnostic model for differentiating myocardial infarction type, and approaches for compound identification given mass spectrometry data.

    Get PDF
    Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, which we define as the testing of hypotheses that are more complex than single metabolite hypothesis tests. This methodology utilizes informative priors that are generated via the analysis of molecular structure similarity to enable the estimation of metabolite interactomes (or probabilistic models) which are organism-, sample media-, and condition-specific as well as comprehensive; and that can serve as reference models for studying perturbations in metabolic systems. After discussing the development of our methodology, we present an evaluation of its performance conducted using simulation studies, and we use the methodology for estimating a plasma metabolite interactome for stable heart disease. This interactome may serve as a reference model for evaluating systems-level changes that occur with acute disease events such as myocardial infarction (MI) or unstable angina. In the second part of this work, we present the challenge of developing diagnostic classification models which utilize metabolite abundances and that do not overfit relatively small sample sizes, especially given the high dimensionality of metabolite data acquired using platforms such as liquid chromatography-mass spectrometry. We use a Bayesian methodology for estimating a multinomial logistic regression classifier for the detection and discrimination of the subtype of acute myocardial infarction utilizing metabolite abundance data quantified from blood plasma. As heart disease is the leading cause of global mortality, a blood-based and non-invasive diagnostic test that could differentiate between MI types at the time of the event would have great utility. In the final part of this dissertation we review Bayesian approaches for compound identification in metabolomics experiments that utilize liquid chromatography-mass spectrometry which remains a challenging problem

    Applications of Granger causality to biological data

    Get PDF
    In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. In literature, there are several well-established reverse-engineering approaches to explore causal relationships in a dynamic network, such as ordinary differential equations (ODE), Bayesian networks, information theory and Granger Causality. To apply the four different approaches to the same problem, a key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results. In this thesis, I provided an answer by focusing on a systematic and computationally intensive comparison between the two common approaches which are dynamic Bayesian network inference and Granger causality. The comparison was carried out on both synthesized and experimental data. It is concluded that the dynamic Bayesian network inference performs better than the Granger causality approach, when the data size is short; otherwise the Granger causality approach is better. Since the Granger causality approach is able to detect weak interactions when the time series are long enough, I then focused on applying Granger causality approach on real experimental data both in the time and frequency domain and in local and global networks. For a small gene network, Granger causality outperformed all the other three approaches mentioned above. A global protein network of 812 proteins was reconstructed, using a novel approach. The obtained results fitted well with known experimental findings and predicted many experimentally testable results. In addition to interactions in the time domain, interactions in the frequency domain were also recovered. In addition to gene and protein data, Granger causality approach was also applied on Local Field Potential (LFP) data. Here we have combined multiarray electrophysiological recordings of local field potentials in both right inferior temporal (rIT) and left IT (lIT) and right anterior cingulate (rAC) cortices in sheep with Granger causality to investigate how anaesthesia alters processing during resting state and exposure to pictures of faces. Results from both the time and frequency domain analyses show that loss of consciousness during anaesthesia is associated with a reduction/disruption of feed forward open-loop cortico-cortical connections and a corresponding increase in shorter-distance closed loop ones.EThOS - Electronic Theses Online ServiceUniversity of Warwick. Dept. of Computer ScienceGBUnited Kingdo

    Population Physiology, Demography, and Genetics of Side-Blotched Lizards (\u3cem\u3eUta stansburiana\u3c/em\u3e) Residing in Urban and Natural Environments

    Get PDF
    Wildlife populations across the globe are poised to lose their natural habitat to urbanization, yet there is limited information on how different species handle living in cities. Animals in urban environments are often susceptible to novel stressors, which can threaten their individual health and population viability. The physiological characteristics of animals, such as those related to metabolic hormones, oxidative stress, and immunity, are expected to be important for survival in this context. If so, animals persisting in urban areas may demonstrate physiological differences from their natural counterparts, perhaps due to evolutionary change. These potential outcomes have been documented in birds and mammals, but other taxonomic groups such as reptiles have been studied far less. For this dissertation, lizards were sampled in urban and natural areas for six years to (i) compare annual population survival, (ii) identify physiological traits important for survival, (iii) map the genetic basis of these traits, and (iv) test if and how the physiological traits are evolving in urban environments. Lizard survival was lower in urban environments and related to differences in immunity. Each physiological trait had a low to moderate heritable basis linked to few genetic loci with measurable effects. Population-level genetic comparisons revealed lizards in urban areas to be differentiated from those residing in natural areas, though shared genetic variation was present among populations along with comparable levels of genetic diversity. Differential selective pressures on the traits and their associated genetic loci were not detected, but indicators of genetic drift were evident across the landscape. Altogether, these findings shed light on the interconnectedness of population demography, physiology, and genetics for reptiles residing in urban environments

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    • …
    corecore