331,638 research outputs found
Diffusion map for clustering fMRI spatial maps extracted by independent component analysis
Functional magnetic resonance imaging (fMRI) produces data about activity
inside the brain, from which spatial maps can be extracted by independent
component analysis (ICA). In datasets, there are n spatial maps that contain p
voxels. The number of voxels is very high compared to the number of analyzed
spatial maps. Clustering of the spatial maps is usually based on correlation
matrices. This usually works well, although such a similarity matrix inherently
can explain only a certain amount of the total variance contained in the
high-dimensional data where n is relatively small but p is large. For
high-dimensional space, it is reasonable to perform dimensionality reduction
before clustering. In this research, we used the recently developed diffusion
map for dimensionality reduction in conjunction with spectral clustering. This
research revealed that the diffusion map based clustering worked as well as the
more traditional methods, and produced more compact clusters when needed.Comment: 6 pages. 8 figures. Copyright (c) 2013 IEEE. Published at 2013 IEEE
International Workshop on Machine Learning for Signal Processin
The Hidden Convexity of Spectral Clustering
In recent years, spectral clustering has become a standard method for data
analysis used in a broad range of applications. In this paper we propose a new
class of algorithms for multiway spectral clustering based on optimization of a
certain "contrast function" over the unit sphere. These algorithms, partly
inspired by certain Independent Component Analysis techniques, are simple, easy
to implement and efficient.
Geometrically, the proposed algorithms can be interpreted as hidden basis
recovery by means of function optimization. We give a complete characterization
of the contrast functions admissible for provable basis recovery. We show how
these conditions can be interpreted as a "hidden convexity" of our optimization
problem on the sphere; interestingly, we use efficient convex maximization
rather than the more common convex minimization. We also show encouraging
experimental results on real and simulated data.Comment: 22 page
Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas
<br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br>
<br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br>
Clustering by non-negative matrix factorization with independent principal component initialization
Non negative matrix factorization (NMF) is a dimensionality reduction and clustering method, and has been applied to many areas such as bioinformatics, face images classification, and so on. Based on the traditional NMF, researchers recently have put forward several new algorithms on the initialization area to improve its performance. In this paper, we explore the clustering performance of the NMF algorithm, with emphasis on the initialization problem. We propose an initialization method based on independent principal component analysis (IPCA) for NMF. The experiments were carried out on the four real datasets and the results showed that the IPCA-based initialization of NMF gets better clustering of the datasets compared with both random and PCA-based initializations
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
Least Dependent Component Analysis Based on Mutual Information
We propose to use precise estimators of mutual information (MI) to find least
dependent components in a linearly mixed signal. On the one hand this seems to
lead to better blind source separation than with any other presently available
algorithm. On the other hand it has the advantage, compared to other
implementations of `independent' component analysis (ICA) some of which are
based on crude approximations for MI, that the numerical values of the MI can
be used for:
(i) estimating residual dependencies between the output components;
(ii) estimating the reliability of the output, by comparing the pairwise MIs
with those of re-mixed components;
(iii) clustering the output according to the residual interdependencies.
For the MI estimator we use a recently proposed k-nearest neighbor based
algorithm. For time sequences we combine this with delay embedding, in order to
take into account non-trivial time correlations. After several tests with
artificial data, we apply the resulting MILCA (Mutual Information based Least
dependent Component Analysis) algorithm to a real-world dataset, the ECG of a
pregnant woman.
The software implementation of the MILCA algorithm is freely available at
http://www.fz-juelich.de/nic/cs/softwareComment: 18 pages, 20 figures, Phys. Rev. E (in press
- …