10 research outputs found

    Clustering of Pain Dynamics in Sickle Cell Disease from Sparse, Uneven Samples

    Get PDF
    Irregularly sampled time series data are common in a variety of fields. Many typical methods for drawing insight from data fail in this case. Here we attempt to generalize methods for clustering trajectories to irregularly and sparsely sampled data. We first construct synthetic data sets, then propose and assess four methods of data alignment to allow for application of spectral clustering. We also repeat the same process for real data drawn from medical records of patients with sickle cell disease -- patients whose subjective experiences of pain were tracked for several months via a mobile app. We find that different methods for aligning irregularly sampled sparse data sets can lead to different optimal numbers of clusters, even for synthetic data with known properties. For the case of sickle cell disease, we find that three clusters is a reasonable choice, and these appear to correspond to (1) a low pain group with occasionally acute pain, (2) a group which experiences moderate mean pain that fluctuates often from low to high, and (3) a group that experiences persistent high levels of pain. Our results may help physicians and patients better understand and manage patients\u27 pain levels over time, and we expect that the methods we develop will apply to a wide range of other data sources in medicine and beyond

    Extracting binary signals from microarray time-course data

    Get PDF
    This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations

    Extracting binary signals from microarray time-course data

    Get PDF
    This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations

    Microarray time-series data clustering via gene expression profile alignment

    Get PDF
    Clustering gene expression data given In terms of time-series is a challenging problem that imposes its own particular constraints, namely, exchanging two or more time points is not possible as it would deliver quite different results and would lead to erroneous biological conclusions. In this thesis, clustering methods introducing the concept of multiple alignment of natural cubic spline representations of gene expression profiles are presented. The multiple alignment is achieved by minimizing the sum of integrated squared errors over a time-interval, defined on a set of profiles. The proposed approach with flat clustering algorithms like k-means and EM are shown to cluster microarray time-series profiles efficiently and reduce the computational time significantly. The effectiveness of the approaches is experimented on six data sets. Experiments have also been carried out in order to determine the number of clusters and to determine the accuracies of the proposed approaches
    corecore