7,073 research outputs found
Model Selection for Gaussian Mixture Models
This paper is concerned with an important issue in finite mixture modelling,
the selection of the number of mixing components. We propose a new penalized
likelihood method for model selection of finite multivariate Gaussian mixture
models. The proposed method is shown to be statistically consistent in
determining of the number of components. A modified EM algorithm is developed
to simultaneously select the number of components and to estimate the mixing
weights, i.e. the mixing probabilities, and unknown parameters of Gaussian
distributions. Simulations and a real data analysis are presented to illustrate
the performance of the proposed method
Generalized gene co-expression analysis via subspace clustering using low-rank representation
BACKGROUND:
Gene Co-expression Network Analysis (GCNA) helps identify gene modules with potential biological functions and has become a popular method in bioinformatics and biomedical research. However, most current GCNA algorithms use correlation to build gene co-expression networks and identify modules with highly correlated genes. There is a need to look beyond correlation and identify gene modules using other similarity measures for finding novel biologically meaningful modules.
RESULTS:
We propose a new generalized gene co-expression analysis algorithm via subspace clustering that can identify biologically meaningful gene co-expression modules with genes that are not all highly correlated. We use low-rank representation to construct gene co-expression networks and local maximal quasi-clique merger to identify gene co-expression modules. We applied our method on three large microarray datasets and a single-cell RNA sequencing dataset. We demonstrate that our method can identify gene modules with different biological functions than current GCNA methods and find gene modules with prognostic values.
CONCLUSIONS:
The presented method takes advantage of subspace clustering to generate gene co-expression networks rather than using correlation as the similarity measure between genes. Our generalized GCNA method can provide new insights from gene expression datasets and serve as a complement to current GCNA algorithms
- …