Search CORE

31,480 research outputs found

Kernel Spectral Clustering and applications

In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane in a high dimensional space induced by a kernel. In addition, the multi-way clustering can be obtained by combining a set of binary decision functions via an Error Correcting Output Codes (ECOC) encoding scheme. Because of its model-based nature, the KSC method encompasses three main steps: training, validation, testing. In the validation stage model selection is performed to obtain tuning parameters, like the number of clusters present in the data. This is a major advantage compared to classical spectral clustering where the determination of the clustering parameters is unclear and relies on heuristics. Once a KSC model is trained on a small subset of the entire data, it is able to generalize well to unseen test points. Beyond the basic formulation, sparse KSC algorithms based on the Incomplete Cholesky Decomposition (ICD) and

L_0

L_1, L_0 + L_1

, Group Lasso regularization are reviewed. In that respect, we show how it is possible to handle large scale data. Also, two possible ways to perform hierarchical clustering and a soft clustering method are presented. Finally, real-world applications such as image segmentation, power load time-series clustering, document clustering and big data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms

arXiv.org e-Print Archive

Crossref

Learning with Algebraic Invariances, and the Invariant Kernel Trick

Author: Király Franz J.
Müller Klaus-Robert
Ziehe Andreas
Publication venue
Publication date: 28/11/2014
Field of study

When solving data analysis problems it is important to integrate prior knowledge and/or structural invariances. This paper contributes by a novel framework for incorporating algebraic invariance structure into kernels. In particular, we show that algebraic properties such as sign symmetries in data, phase independence, scaling etc. can be included easily by essentially performing the kernel trick twice. We demonstrate the usefulness of our theory in simulations on selected applications such as sign-invariant spectral clustering and underdetermined ICA

arXiv.org e-Print Archive

UCL Discovery

Spectral analysis of the Gram matrix of mixture models

Author: Benaych-Georges Florent
Couillet Romain
Publication venue
Publication date: 01/01/2016
Field of study

This text is devoted to the asymptotic study of some spectral properties of the Gram matrix

W^{\sf T} W

built upon a collection

w_1, \ldots, w_n\in \mathbb{R}^p

of random vectors (the columns of

W

), as both the number

n

of observations and the dimension

p

of the observations tend to infinity and are of similar order of magnitude. The random vectors

w_1, \ldots, w_n

are independent observations, each of them belonging to one of

k

classes

\mathcal{C}_1,\ldots, \mathcal{C}_k

. The observations of each class

\mathcal{C}_a

(

1\le a\le k

) are characterized by their distribution

\mathcal{N}(0, p^{-1}C_a)

, where

C_1, \ldots, C_k

are some non negative definite

p\times p

matrices. The cardinality

n_a

of class

\mathcal{C}_a

and the dimension

p

of the observations are such that

\frac{n_a}{n}

(

1\le a\le k

) and

\frac{p}{n}

stay bounded away from

0

and

+\infty

. We provide deterministic equivalents to the empirical spectral distribution of

W^{\sf T}W

and to the matrix entries of its resolvent (as well as of the resolvent of

WW^{\sf T}

). These deterministic equivalents are defined thanks to the solutions of a fixed-point system. Besides, we prove that

W^{\sf T} W

has asymptotically no eigenvalues outside the bulk of its spectrum, defined thanks to these deterministic equivalents. These results are directly used in our companion paper "Kernel spectral clustering of large dimensional data", which is devoted to the analysis of the spectral clustering algorithm in large dimensions. They also find applications in various other fields such as wireless communications where functionals of the aforementioned resolvents allow one to assess the communication performance across multi-user multi-antenna channels.Comment: 25 pages, 1 figure. The results of this paper are directly used in our companion paper "Kernel spectral clustering of large dimensional data", which is devoted to the analysis of the spectral clustering algorithm in large dimensions. To appear in ESAIM Probab. Statis

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

HAL Descartes

Numérisation de Documents Anciens Mathématiques

HAL-Rennes 1

Spectral Clustering of Mixed-Type Data

Author: Mbuga Felix
Tortora Cristina
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

Cluster analysis seeks to assign objects with similar characteristics into groups called clusters so that objects within a group are similar to each other and dissimilar to objects in other groups. Spectral clustering has been shown to perform well in different scenarios on continuous data: it can detect convex and non-convex clusters, and can detect overlapping clusters. However, the constraint on continuous data can be limiting in real applications where data are often of mixed-type, i.e., data that contains both continuous and categorical features. This paper looks at extending spectral clustering to mixed-type data. The new method replaces the Euclidean-based similarity distance used in conventional spectral clustering with different dissimilarity measures for continuous and categorical variables. A global dissimilarity measure is than computed using a weighted sum, and a Gaussian kernel is used to convert the dissimilarity matrix into a similarity matrix. The new method includes an automatic tuning of the variable weight and kernel parameter. The performance of spectral clustering in different scenarios is compared with that of two state-of-the-art mixed-type data clustering methods, k-prototypes and KAMILA, using several simulated and real data sets

SJSU ScholarWorks