102 research outputs found

    Penalized Clustering of Large Scale Functional Data with Multiple Covariates

    Full text link
    In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric multivariate functions (fixed effects), which have great flexibility in modeling a variety of function features, such as jump points, branching, and periodicity. Functional ANOVA is employed to further decompose multivariate functions in a reproducing kernel Hilbert space and provide associated notions of main effect and interaction. Parsimonious random effects are used to capture various correlation structures. The mixed-effect models are nested under a general mixture model, in which the heterogeneity of functional data is characterized. We propose a penalized Henderson's likelihood approach for model-fitting and design a rejection-controlled EM algorithm for the estimation. Our method selects smoothing parameters through generalized cross-validation. Furthermore, the Bayesian confidence intervals are used to measure the clustering uncertainty. Simulation studies and real-data examples are presented to investigate the empirical performance of the proposed method. Open-source code is available in the R package MFDA

    The Expression of irx7 in the Inner Nuclear Layer of Zebrafish Retina Is Essential for a Proper Retinal Development and Lamination.

    Get PDF
    Irx7, a member in the zebrafish iroquois transcription factor (TF) family, has been shown to control brain patterning. During retinal development, irx7\u27s expression was found to appear exclusively in the inner nuclear layer (INL) as soon as the prospective INL cells withdraw from the cell cycle and during retinal lamination. In Irx7-deficient retinas, the formation of a proper retinal lamination was disrupted and the differentiation of INL cell types, including amacrine, horizontal, bipolar and Muller cells, was compromised. Despite irx7\u27s exclusive expression in the INL, photoreceptors differentiation was also compromised in Irx7-deficient retinas. Compared with other retinal cell types, ganglion cells differentiated relatively well in these retinas, except for their dendritic projections into the inner plexiform layer (IPL). In fact, the neuronal projections of amacrine and bipolar cells into the IPL were also diminished. These indicate that the retinal lamination issue in the Irx7-deficient retinas is likely caused by the attenuation of the neurite outgrowth. Since the expression of known TFs that can specify specific retinal cell type was also altered in Irx7-deficient retinas, thus the irx7 gene network is possibly a novel regulatory circuit for retinal development and lamination

    A data-driven clustering method for time course gene expression data

    Get PDF
    Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available ()

    Smoothing Spline ANOVA Models and their Applications in Complex and Massive Datasets

    Get PDF
    Complex and massive datasets can be easily accessed using the newly developed data acquisition technology. In spite of the fact that the smoothing spline ANOVA models have proven to be useful in a variety of fields, these datasets impose the challenges on the applications of the models. In this chapter, we present a selected review of the smoothing spline ANOVA models and highlight some challenges and opportunities in massive datasets. We review two approaches to significantly reduce the computational costs of fitting the model. One real case study is used to illustrate the performance of the reviewed methods
    corecore