Search CORE

331,638 research outputs found

Diffusion map for clustering fMRI spatial maps extracted by independent component analysis

Author: Alluri Vinoo
Brattico Elvira
Cong Fengyu
Nandi Asoke K.
Ristaniemi Tapani
Sipola Tuomo
Toiviainen Petri
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/09/2013
Field of study

Functional magnetic resonance imaging (fMRI) produces data about activity inside the brain, from which spatial maps can be extracted by independent component analysis (ICA). In datasets, there are n spatial maps that contain p voxels. The number of voxels is very high compared to the number of analyzed spatial maps. Clustering of the spatial maps is usually based on correlation matrices. This usually works well, although such a similarity matrix inherently can explain only a certain amount of the total variance contained in the high-dimensional data where n is relatively small but p is large. For high-dimensional space, it is reasonable to perform dimensionality reduction before clustering. In this research, we used the recently developed diffusion map for dimensionality reduction in conjunction with spectral clustering. This research revealed that the diffusion map based clustering worked as well as the more traditional methods, and produced more compact clusters when needed.Comment: 6 pages. 8 figures. Copyright (c) 2013 IEEE. Published at 2013 IEEE International Workshop on Machine Learning for Signal Processin

arXiv.org e-Print Archive

Crossref

The Hidden Convexity of Spectral Clustering

Author: Belkin Mikhail
Rademacher Luis
Voss James
Publication venue
Publication date: 02/03/2016
Field of study

In recent years, spectral clustering has become a standard method for data analysis used in a broad range of applications. In this paper we propose a new class of algorithms for multiway spectral clustering based on optimization of a certain "contrast function" over the unit sphere. These algorithms, partly inspired by certain Independent Component Analysis techniques, are simple, easy to implement and efficient. Geometrically, the proposed algorithms can be interpreted as hidden basis recovery by means of function optimization. We give a complete characterization of the contrast functions admissible for provable basis recovery. We show how these conditions can be interpreted as a "hidden convexity" of our optimization problem on the sphere; interestingly, we use efficient convex maximization rather than the more common convex minimization. We also show encouraging experimental results on real and simulated data.Comment: 22 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

Author: Dean Nema
Nugent Rebecca
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/08/2013
Field of study

<br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br> <br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br&gt

Enlighten

Clustering by non-negative matrix factorization with independent principal component initialization

Author: Gong Liyun
K. Nandi Asoke
Publication venue
Publication date: 09/09/2013
Field of study

Non negative matrix factorization (NMF) is a dimensionality reduction and clustering method, and has been applied to many areas such as bioinformatics, face images classification, and so on. Based on the traditional NMF, researchers recently have put forward several new algorithms on the initialization area to improve its performance. In this paper, we explore the clustering performance of the NMF algorithm, with emphasis on the initialization problem. We propose an initialization method based on independent principal component analysis (IPCA) for NMF. The experiments were carried out on the four real datasets and the results showed that the IPCA-based initialization of NMF gets better clustering of the datasets compared with both random and PCA-based initializations

University of Lincoln Institutional Repository

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX

Least Dependent Component Analysis Based on Mutual Information

Author: A. Cichocki
A. Cichocki
A. Hyvärinen
A. Hyvärinen
A. K. Jain
A. Ziehe
Alexander Kraskov
E. Ott
F. R. Bach
H. Kantz
Harald Stögbauer
J. Chen
J.-F. Cardoso
J.-F. Cardoso
L. F. Kozachenko
O. Vasicek
P. Grassberger
Peter Grassberger
R. L. Somorjai
S. Amari
S. E. Stein
Sergey A. Astakhov
T. M. Cover
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2004
Field of study

We propose to use precise estimators of mutual information (MI) to find least dependent components in a linearly mixed signal. On the one hand this seems to lead to better blind source separation than with any other presently available algorithm. On the other hand it has the advantage, compared to other implementations of `independent' component analysis (ICA) some of which are based on crude approximations for MI, that the numerical values of the MI can be used for: (i) estimating residual dependencies between the output components; (ii) estimating the reliability of the output, by comparing the pairwise MIs with those of re-mixed components; (iii) clustering the output according to the residual interdependencies. For the MI estimator we use a recently proposed k-nearest neighbor based algorithm. For time sequences we combine this with delay embedding, in order to take into account non-trivial time correlations. After several tests with artificial data, we apply the resulting MILCA (Mutual Information based Least dependent Component Analysis) algorithm to a real-world dataset, the ECG of a pregnant woman. The software implementation of the MILCA algorithm is freely available at http://www.fz-juelich.de/nic/cs/softwareComment: 18 pages, 20 figures, Phys. Rev. E (in press

arXiv.org e-Print Archive

CiteSeerX

Crossref

Juelich Shared Electronic Resources

CERN Document Server