thesis

Bayesian clustering of curves and the search of the partition space

Abstract

This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes

    Similar works