61 research outputs found

    Bayesian clustering of curves and the search of the partition space

    Get PDF
    This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes

    First CLADAG data mining prize : data mining for longitudinal data with different marketing campaigns

    Get PDF
    The CLAssification and Data Analysis Group (CLADAG) of the Italian Statistical Society recently organised a competition, the 'Young Researcher Data Mining Prize' sponsored by the SAS Institute. This paper was the winning entry and in it we detail our approach to the problem proposed and our results. The main methods used are linear regression, mixture models, Bayesian autoregressive and Bayesian dynamic models

    Beyond Conjugacy for Chain Event Graph Model Selection

    Full text link
    Chain event graphs are a family of probabilistic graphical models that generalise Bayesian networks and have been successfully applied to a wide range of domains. Unlike Bayesian networks, these models can encode context-specific conditional independencies as well as asymmetric developments within the evolution of a process. More recently, new model classes belonging to the chain event graph family have been developed for modelling time-to-event data to study the temporal dynamics of a process. However, existing model selection algorithms for chain event graphs and its variants rely on all parameters having conjugate priors. This is unrealistic for many real-world applications. In this paper, we propose a mixture modelling approach to model selection in chain event graphs that does not rely on conjugacy. Moreover, we also show that this methodology is more amenable to being robustly scaled than the existing model selection algorithms used for this family. We demonstrate our techniques on simulated datasets

    Bayesian clustering of curves and the search of the partition space

    Get PDF
    This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes.EThOS - Electronic Theses Online ServiceUniversity of Warwick. Dept of Statistics (UoW)Engineering and Physical Sciences Research Council (Great Britain) (EPSRC)GBUnited Kingdo

    Orchestrated transcription of biological processes in the marine picoeukaryote Ostreococcus exposed to light/dark cycles

    Get PDF
    Background: Picoeukaryotes represent an important, yet poorly characterized component of marine phytoplankton. The recent genome availability for two species of Ostreococcus and Micromonas has led to the emergence of picophytoplankton comparative genomics. Sequencing has revealed many unexpected features about genome structure and led to several hypotheses on Ostreococcus biology and physiology. Despite the accumulation of genomic data, little is known about gene expression in eukaryotic picophytoplankton. Results: We have conducted a genome-wide analysis of gene expression in Ostreococcus tauri cells exposed to light/dark cycles (L/D). A Bayesian Fourier Clustering method was implemented to cluster rhythmic genes according to their expression waveform. In a single L/D condition nearly all expressed genes displayed rhythmic patterns of expression. Clusters of genes were associated with the main biological processes such as transcription in the nucleus and the organelles, photosynthesis, DNA replication and mitosis. Conclusions: Light/Dark time-dependent transcription of the genes involved in the main steps leading to protein synthesis (transcription basic machinery, ribosome biogenesis, translation and aminoacid synthesis) was observed, to an unprecedented extent in eukaryotes, suggesting a major input of transcriptional regulations in Ostreococcus. We propose that the diurnal co-regulation of genes involved in photoprotection, defence against oxidative stress and DNA repair might be an efficient mechanism, which protects cells against photo-damage thereby, contributing to the ability of O. tauri to grow under a wide range of light intensities

    Bayesian Graphs of Intelligent Causation

    Full text link
    Probabilistic Graphical Bayesian models of causation have continued to impact on strategic analyses designed to help evaluate the efficacy of different interventions on systems. However, the standard causal algebras upon which these inferences are based typically assume that the intervened population does not react intelligently to frustrate an intervention. In an adversarial setting this is rarely an appropriate assumption. In this paper, we extend an established Bayesian methodology called Adversarial Risk Analysis to apply it to settings that can legitimately be designated as causal in this graphical sense. To embed this technology we first need to generalize the concept of a causal graph. We then proceed to demonstrate how the predicable intelligent reactions of adversaries to circumvent an intervention when they hear about it can be systematically modelled within such graphical frameworks, importing these recent developments from Bayesian game theory. The new methodologies and supporting protocols are illustrated through applications associated with an adversary attempting to infiltrate a friendly state

    Incidence of falls among adults with cerebral palsy: a cohort study using primary care data

    Get PDF
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154463/1/dmcn14444_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154463/2/dmcn14444-sup-0001-AppendixS1.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154463/3/dmcn14444.pd

    Modelling collinear and spatially correlated data

    Get PDF
    In this work we present a statistical approach to distinguish and interpret the complex relationship between several predictors and a response variable at the small area level, in the presence of i) high correlation between the predictors and ii) spatial correlation for the response. Covariates which are highly correlated create collinearity problems when used in a standard multiple regression model. Many methods have been proposed in the literature to address this issue. A very common approach is to create an index which aggregates all the highly correlated variables of interest. For example, it is well known that there is a relationship between social deprivation measured through the Multiple Deprivation Index (IMD) and air pollution; this index is then used as a confounder in assessing the e ect of air pollution on health outcomes (e.g. respiratory hospital admissions or mortality). However it would be more informative to look specically at each domain of the IMD and at its relationship with air pollution to better understand its role as a confounder in the epidemiological analyses. In this paper we illustrate how the complex relationships between the domains of IMD and air pollution can be deconstructed and analysed using pro le regression, a Bayesian non-parametric model for clustering responses and covariates simultaneously. Moreover, we include an intrinsic spatial conditional autoregressive (ICAR) term to account for the spatial correlation of the response variable

    Circadian clock components control daily growth activities by modulating cytokinin levels and cell division-associated gene expression in <i>Populus</i> trees

    Get PDF
    Trees are carbon dioxide sinks and major producers of terrestrial biomass with distinct seasonal growth patterns. Circadian clocks enable the coordination of physiological and biochemical temporal activities, optimally regulating multiple traits including growth. To dissect the clock's role in growth, we analysed Populus tremula x P. tremuloides trees with impaired clock function due to down-regulation of central clock components. late elongated hypocotyl (lhy-10) trees, in which expression of LHY1 and LHY2 is reduced by RNAi, have a short free-running period and show disrupted temporal regulation of gene expression and reduced growth, producing 30-40% less biomass than wild-type trees. Genes important in growth regulation were expressed with an earlier phase in lhy-10, and CYCLIN D3 expression was misaligned and arrhythmic. Levels of cytokinins were lower in lhy-10 trees, which also showed a change in the time of peak expression of genes associated with cell division and growth. However, auxin levels were not altered in lhy-10 trees, and the size of the lignification zone in the stem showed a relative increase. The reduced growth rate and anatomical features of lhy-10 trees were mainly caused by misregulation of cell division, which may have resulted from impaired clock function
    corecore