4,207 research outputs found
Spectral Dimensionality Reduction
In this paper, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian Eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name 'spectral'). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps to generalize some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error. Dans cet article, nous étudions et développons un cadre unifié pour un certain nombre de méthodes non linéaires de réduction de dimensionalité, telles que LLE, Isomap, LE (Laplacian Eigenmap) et ACP à noyaux, qui font de la décomposition en valeurs propres (d'où le nom "spectral"). Ce cadre inclut également des méthodes classiques telles que l'ACP et l'échelonnage multidimensionnel métrique (MDS). Il inclut aussi l'étape de transformation de données utilisée dans l'agrégation spectrale. Nous montrons que, dans tous les cas, l'algorithme d'apprentissage estime les fonctions propres principales d'un opérateur qui dépend de la densité inconnue de données et d'un noyau qui n'est pas nécessairement positif semi-défini. Ce cadre aide à généraliser certains modèles pour prédire les coordonnées des exemples hors-échantillons sans avoir à réentraîner le modèle. Il aide également à rendre plus transparent ce que ces algorithmes minimisent sur les données empiriques et donne une notion correspondante d'erreur de généralisation.non-parametric models, non-linear dimensionality reduction, kernel models, modèles non paramétriques, réduction de dimensionalité non linéaire, modèles à noyau
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made
Fuzzy clustering with volume prototypes and adaptive cluster merging
Two extensions to the objective function-based fuzzy
clustering are proposed. First, the (point) prototypes are extended to hypervolumes, whose size can be fixed or can be determined automatically from the data being clustered. It is shown that clustering with hypervolume prototypes can be formulated as the minimization of an objective function. Second, a heuristic cluster merging step is introduced where the similarity among the clusters
is assessed during optimization. Starting with an overestimation of the number of clusters in the data, similar clusters are merged in order to obtain a suitable partitioning. An adaptive threshold for merging is proposed. The extensions proposed are applied to
Gustafson–Kessel and fuzzy c-means algorithms, and the resulting extended algorithm is given. The properties of the new algorithm are illustrated by various examples
Adaptive Resonance Theory
SyNAPSE program of the Defense Advanced Projects Research Agency (Hewlett-Packard Company, subcontract under DARPA prime contract HR0011-09-3-0001, and HRL Laboratories LLC, subcontract #801881-BS under DARPA prime contract HR0011-09-C-0001); CELEST, an NSF Science of Learning Center (SBE-0354378
- …