Search CORE

65,062 research outputs found

Mixtures of Spatial Spline Regressions

Author: McLachlan Geoffrey J.
Nguyen Hien D.
Wood Ian A.
Publication venue
Publication date: 13/06/2013
Field of study

We present an extension of the functional data analysis framework for univariate functions to the analysis of surfaces: functions of two variables. The spatial spline regression (SSR) approach developed can be used to model surfaces that are sampled over a rectangular domain. Furthermore, combining SSR with linear mixed effects models (LMM) allows for the analysis of populations of surfaces, and combining the joint SSR-LMM method with finite mixture models allows for the analysis of populations of surfaces with sub-family structures. Through the mixtures of spatial splines regressions (MSSR) approach developed, we present methodologies for clustering surfaces into sub-families, and for performing surface-based discriminant analysis. The effectiveness of our methodologies, as well as the modeling capabilities of the SSR model are assessed through an application to handwritten character recognition

arXiv.org e-Print Archive

CiteSeerX

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Author: Hennig Christian
Lin Chien-Ju
Publication venue
Publication date: 09/02/2015
Field of study

There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed type data, temporal and spatial autocorrelation

arXiv.org e-Print Archive

Springer - Publisher Connector

Variational approximation for mixtures of linear mixed models

Author: Armagan A.
Attias H.
Booth J.G.
Corduneanu A.
David J. Nott
Dempster A.P.
Meng X.L.
Papaspiliopoulos O.
Sahu S.K.
Scharl T.
Siew Li Tan
Verbeek J.J.
Wang B.
Waterhouse S.
Winn J.
Wu B.
Yeung K.Y.
———
Publication venue: 'Informa UK Limited'
Publication date: 29/08/2012
Field of study

Mixtures of linear mixed models (MLMMs) are useful for clustering grouped data and can be estimated by likelihood maximization through the EM algorithm. The conventional approach to determining a suitable number of components is to compare different mixture models using penalized log-likelihood criteria such as BIC.We propose fitting MLMMs with variational methods which can perform parameter estimation and model selection simultaneously. A variational approximation is described where the variational lower bound and parameter updates are in closed form, allowing fast evaluation. A new variational greedy algorithm is developed for model selection and learning of the mixture components. This approach allows an automatic initialization of the algorithm and returns a plausible number of mixture components automatically. In cases of weak identifiability of certain model parameters, we use hierarchical centering to reparametrize the model and show empirically that there is a gain in efficiency by variational algorithms similar to that in MCMC algorithms. Related to this, we prove that the approximate rate of convergence of variational algorithms by Gaussian approximation is equal to that of the corresponding Gibbs sampler which suggests that reparametrizations can lead to improved convergence in variational algorithms as well.Comment: 36 pages, 5 figures, 2 tables, submitted to JCG

arXiv.org e-Print Archive

Crossref

FigShare

Pancancer analysis of DNA methylation-driven genes using MethylMix.

Author: Gevaert Olivier
Plevritis Sylvia K
Tibshirani Robert
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California