25 research outputs found

    Circadian Clock Genes Contribute to the Regulation of Hair Follicle Cycling

    Get PDF
    Hair follicles undergo recurrent cycling of controlled growth (anagen), regression (catagen), and relative quiescence (telogen) with a defined periodicity. Taking a genomics approach to study gene expression during synchronized mouse hair follicle cycling, we discovered that, in addition to circadian fluctuation, CLOCK–regulated genes are also modulated in phase with the hair growth cycle. During telogen and early anagen, circadian clock genes are prominently expressed in the secondary hair germ, which contains precursor cells for the growing follicle. Analysis of Clock and Bmal1 mutant mice reveals a delay in anagen progression, and the secondary hair germ cells show decreased levels of phosphorylated Rb and lack mitotic cells, suggesting that circadian clock genes regulate anagen progression via their effect on the cell cycle. Consistent with a block at the G1 phase of the cell cycle, we show a significant upregulation of p21 in Bmal1 mutant skin. While circadian clock mechanisms have been implicated in a variety of diurnal biological processes, our findings indicate that circadian clock genes may be utilized to modulate the progression of non-diurnal cyclic processes

    Pattern discovery in sequences under a Markov assumption

    Full text link

    Sequential pattern discovery under a markov assumption

    No full text
    In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA sequences. There are a number of fundamental aspects of this data mining problem that can make discovery “easy ” or “hard”—we characterize the difficulty of learning in this context using an analysis based on the Bayes error rate under a Markov assumption. The Bayes error framework demonstrates why certain patterns are much harder to discover than others. It also explains the role of different parameters such as pattern length and pattern frequency in sequential discovery. We demonstrate how the Bayes error can be used to calibrate existing discovery algorithms, providing a lower bound on achievable performance. We discuss a number of fundamental issues that characterize sequential pattern discovery in this context, present a variety of empirical results to complement and verify the theoretical analysis, and apply our methodology to real-world motif-discovery problems in computational biology. 2

    Towards Scalable Support Vector Machines using Squashing

    No full text
    Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of SVMs is the necessity to solve a large-scale quadratic programming problem. This paper combines likelihood-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets. We reduce the problem of training the SVMs on the weighted "squashed" data to a quadratic programming problem and show that it can be solved using Platt's sequential minimal optimization (SMO) algorithm. We compare performance of the SMO algorithm on the squashed and the full data, as well as on simple random and boosted samples of the data. Experiments on a number of datasets show that squashing allows one to speed-up training, decrease memory requirements, and obtain parameter estimates close to that of the full data. More importantly, squashing produces close to optimal classific..

    Probabilistic Models for Joint Clustering and Time-Warping of Multidimensional Curves

    No full text
    In this paper we present a family of models and learning algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows both local nonlinear time warping and global linear shifts of the observed curves in both time and measurement spaces relative to the mean curves within the clusters. The resulting model canbeviewedasaformofBayesiannetwork with a special temporal structure. The Expectation-Maximization (EM) algorithm is used to simultaneously recover both the curve models for each cluster, and the most likely alignments and cluster membership for each curve. We evaluate the methodology on two real-world data sets, and show that the Bayesian network models provide systematic improvements in predictive power over more conventional clustering approaches

    Gene Expression Clustering with Functional Mixture Models

    Get PDF
    We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures

    Translation-Invariant Mixture Models for Curve Clustering

    Get PDF
    In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being generated from a finite mixture of curve models. Each mixture component uses (a) a mean curve based on a flexible non-parametric representation, (b) additive measurement noise, (c) randomly selected discrete-valued shifts of each curve with respect to the independent variable (i.e., typically along the time axis), and (d) random real-valued o#sets of each curve with respect to the observed variable. We show that the Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely shifts, o#sets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power on test data sets. The proposed approach provides a non-parametric, computationally e#cient, and robust methodology for clustering broad classes of curve data

    Gene Expression Clustering with Functional Mixture Models

    No full text
    We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures
    corecore