141 research outputs found
Penalized Clustering of Large Scale Functional Data with Multiple Covariates
In this article, we propose a penalized clustering method for large scale
data with multiple covariates through a functional data approach. In the
proposed method, responses and covariates are linked together through
nonparametric multivariate functions (fixed effects), which have great
flexibility in modeling a variety of function features, such as jump points,
branching, and periodicity. Functional ANOVA is employed to further decompose
multivariate functions in a reproducing kernel Hilbert space and provide
associated notions of main effect and interaction. Parsimonious random effects
are used to capture various correlation structures. The mixed-effect models are
nested under a general mixture model, in which the heterogeneity of functional
data is characterized. We propose a penalized Henderson's likelihood approach
for model-fitting and design a rejection-controlled EM algorithm for the
estimation. Our method selects smoothing parameters through generalized
cross-validation. Furthermore, the Bayesian confidence intervals are used to
measure the clustering uncertainty. Simulation studies and real-data examples
are presented to investigate the empirical performance of the proposed method.
Open-source code is available in the R package MFDA
Recommended from our members
Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae
BACKGROUND: Histone acetylation plays important but incompletely understood roles in gene regulation. A comprehensive understanding of the regulatory role of histone acetylation is difficult because many different histone acetylation patterns exist and their effects are confounded by other factors, such as the transcription factor binding sequence motif information and nucleosome occupancy.
RESULTS: We analyzed recent genomewide histone acetylation data using a few complementary statistical models and tested the validity of a cumulative model in approximating the global regulatory effect of histone acetylation. Confounding effects due to transcription factor binding sequence information were estimated by using two independent motif-based algorithms followed by a variable selection method. We found that the sequence information has a significant role in regulating transcription, and we also found a clear additional histone acetylation effect. Our model fits well with observed genome-wide data. Strikingly, including more complicated combinatorial effects does not improve the model's performance. Through a statistical analysis of conditional independence, we found that H4 acetylation may not have significant direct impact on global gene expression.
CONCLUSION: Decoding the combinatorial complexity of histone modification requires not only new data but also new methods to analyze the data. Our statistical analysis confirms that histone acetylation has a significant effect on gene transcription rates in addition to that attributable to upstream sequence motifs. Our analysis also suggests that a cumulative effect model for global histone acetylation is justified, although a more complex histone code may be important at specific gene loci. We also found that the regulatory roles among different histone acetylation sites have important differences.Statistic
Recommended from our members
Bayesian Functional Data Clustering for Temporal Microarray Data
We propose a Bayesian procedure to cluster temporal gene expression microarray profiles,
based on a mixed-effect smoothing-spline model, and design a Gibbs sampler to sample from
the desired posterior distribution. Our method can determine the cluster number automatically
based on the Bayesian information criterion, and handle missing data easily. When applied
to a microarray dataset on the budding yeast, our clustering algorithm provides biologically
meaningful gene clusters according to a functional enrichment analysis
A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography
Accurate hand gesture prediction is crucial for effective upper-limb
prosthetic limbs control. As the high flexibility and multiple degrees of
freedom exhibited by human hands, there has been a growing interest in
integrating deep networks with high-density surface electromyography (HD-sEMG)
grids to enhance gesture recognition capabilities. However, many existing
methods fall short in fully exploit the specific spatial topology and temporal
dependencies present in HD-sEMG data. Additionally, these studies are often
limited number of gestures and lack generality. Hence, this study introduces a
novel gesture recognition method, named STGCN-GR, which leverages
spatio-temporal graph convolution networks for HD-sEMG-based human-machine
interfaces. Firstly, we construct muscle networks based on functional
connectivity between channels, creating a graph representation of HD-sEMG
recordings. Subsequently, a temporal convolution module is applied to capture
the temporal dependences in the HD-sEMG series and a spatial graph convolution
module is employed to effectively learn the intrinsic spatial topology
information among distinct HD-sEMG channels. We evaluate our proposed model on
a public HD-sEMG dataset comprising a substantial number of gestures (i.e.,
65). Our results demonstrate the remarkable capability of the STGCN-GR method,
achieving an impressive accuracy of 91.07% in predicting gestures, which
surpasses state-of-the-art deep learning methods applied to the same dataset
The Expression of irx7 in the Inner Nuclear Layer of Zebrafish Retina Is Essential for a Proper Retinal Development and Lamination.
Irx7, a member in the zebrafish iroquois transcription factor (TF) family, has been shown to control brain patterning. During retinal development, irx7\u27s expression was found to appear exclusively in the inner nuclear layer (INL) as soon as the prospective INL cells withdraw from the cell cycle and during retinal lamination. In Irx7-deficient retinas, the formation of a proper retinal lamination was disrupted and the differentiation of INL cell types, including amacrine, horizontal, bipolar and Muller cells, was compromised. Despite irx7\u27s exclusive expression in the INL, photoreceptors differentiation was also compromised in Irx7-deficient retinas. Compared with other retinal cell types, ganglion cells differentiated relatively well in these retinas, except for their dendritic projections into the inner plexiform layer (IPL). In fact, the neuronal projections of amacrine and bipolar cells into the IPL were also diminished. These indicate that the retinal lamination issue in the Irx7-deficient retinas is likely caused by the attenuation of the neurite outgrowth. Since the expression of known TFs that can specify specific retinal cell type was also altered in Irx7-deficient retinas, thus the irx7 gene network is possibly a novel regulatory circuit for retinal development and lamination
A data-driven clustering method for time course gene expression data
Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a ‘mean curve’ construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available ()
- …